Why glTF is the JPEG for the metaverse and digital twins

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!


The JPEG file format played a crucial role in transitioning the web from a world of text to a visual experience through an open, efficient container for sharing images. Now, the graphics language transmission format (glTF) promises to do the same thing for 3D objects in the metaverse and digital twins. 

JPEG took advantage of various compression tricks to dramatically shrink images compared to other formats like GIF. The latest version of glTF similarly takes advantage of techniques for compressing both geometry of 3D objects and their textures. The glTF is already playing a pivotal role in ecommerce, as evidenced by Adobe’s push into the metaverse. 

VentureBeat talked to  Neil Trevett, president of the Khronos Foundation that is stewarding the glTF standard, to find out more about what glTF means for enterprises. He is also the vice president of developer ecosystems at Nvidia, where his job is to make it easier for developers to use GPUs. He explains how glTF complements other digital twin and metaverse formats like universal scene description (USD), how to use it and where it’s headed. 

VentureBeat: What is glTF and how does it fit into the ecosystem of the metaverse and digital twins related sort of file formats?

Neil Trevett: At Khronos, we put a lot of effort into 3D APIs like OpenGL, WebGL and Vulkan. We found that every application that uses 3D needs to import assets at some point or another. The glTF file format is widely adopted and very complementary to USD, which is becoming the standard for creation and authoring on platforms like Omniverse. USD is the place to be if you want to put multiple tools together in sophisticated pipelines and create very high-end content, including movies. That is why Nvidia is investing heavily in USD for the Omniverse ecosystem. 

On the other hand, glTF focuses on being efficient and easy to use as a delivery format. It is a  lightweight, streamlined and easy to process format that any platform or device can use, down to and including web browsers on mobile phones. The tagline we use as an analogy is that “glTF is the JPEG of 3D.” 

It also complements the file formats used in authoring tools. For example, Adobe Photoshop uses PSD files for editing images. No professional photographer would edit JPEGs because a lot of the information has been lost. PSD files are more sophisticated than JPEGs and support multiple layers. However, you would not send a PSD file to my mom’s cellphone. You need JPEG to get it out to a billion devices as efficiently and quickly as possible. So, USD and glTF similarly complement each other. 

VentureBeat: How do you go from one to another?

Trevett: It’s essential to have a seamless distillation process, from USD assets to glTF assets. Nvidia is investing in a glTF connector for Omniverse so we can seamlessly import and export glTF assets into and out of Omniverse. At the glTF working group at Khronos, we are happy that USD fulfills the industry’s needs for an authoring format because that is a huge amount of work. The goal is for glTF to be the perfect distillation target for USD to support pervasive deployment.

An authoring format and a delivery format have quite different design imperatives. The design of USD is all about flexibility. This helps compose things to make a movie or a VR environment. If you want to bring in another asset and blend it with the existing scene, you must retain all the design information. And you want everything at ground truth levels of resolution and quality. 

The design of a transmission format is different. For example, with glTF, the vertex information is not very flexible for reauthoring. But it’s transmitted in precisely the form that the GPU needs to run that geometry as efficiently as possible through a 3D API like WebGL or Vulkan. So, glTF puts a lot of design effort into compression to reduce download times. For example, Google has contributed their Draco 3D mesh compression technology and Binomial has contributed their Basis universal texture compression technology. We are also beginning to put a lot of effort into level of detail (LOD) management, so you can very efficiently download models. 

Distillation helps go from one file format to the other. A large part of it is stripping out the design and authoring information you no longer need. But you don’t want to reduce the visual quality unless you really have to. With glTF, you can retain the visual fidelity, but you also have the choice to compress things down when you are aiming at low-bandwidth deployment. 

VentureBeat: How much smaller can you make it without losing too much fidelity?

Trevett: It’s like JPEG, where you have a dial for increasing compression with an acceptable loss of image quality, only glTF has the same thing for both geometry and texture compression. If it’s a geometry-intensive CAD model, the geometry will be the bulk of the data. But if it is more of a consumer-oriented model, the texture data can be much larger than the geometry. 

With Draco, shrinking data by five to 10 times is reasonable without any significant drop in quality. There is something similar for texture too. 

Another factor is the amount of memory it takes, which is a precious resource in mobile phones. Before we implemented Binomial compression in glTF, people were sending JPEGs, which is great because they are relatively small. But the process of unpacking this into a full-sized texture can take hundreds of megabytes for even a simple model, which can hurt the power and performance of a mobile phone. The glTF textures allow you to take a JPEG-sized super compressed texture and immediately unpack it into a GPU native texture, so it never grows to full size. As a result, you reduce both data transmission and memory required by 5-10 times. That can help if you’re downloading assets into a browser on a cell phone.

VentureBeat: How do people efficiently represent the textures of 3D objects?

Trevett: Well, there are two basic classes of texture. One of the most common is just image-based textures, such as mapping a logo image onto a t-shirt. The other is procedural texture, where you generate a pattern, like marble, wood, or stone, just by running an algorithm.

There are several algorithms you can use. For example, Allegorithmic, which Adobe recently acquired, pioneered an interesting technique to generate textures now used in Adobe Substance Designer. You often make this texture into an image because it’s easier to process on client devices. 

Once you have a texture, you can do more to it than just slapping it on the model like a piece of wrapping paper. You can use those texture images to drive a more sophisticated material appearance. For example, physically based rendered (PBR) materials are where you try and take it as far as you can emulate the characteristics of real-world materials. Is it metallic, which makes it look shiny? Is it translucent? Does it refract light? Some of the more sophisticated PBR algorithms can use up to 5 or 6 different texture maps feeding in parameters characterizing how shiny or translucent it is. 

VentureBeat: How has glTF progressed on the scene graph side to represent the relationships within objects, such as how car wheels might spin or connect multiple things?

 Trevett: This is an area where USD is a long way ahead of glTF. Most glTF use cases have been satisfied by a single asset in a single asset file up till now. 3D commerce is a leading use case where you want to bring up a chair and drop it into your living room like Ikea. That is a single glTF asset and many of the use cases have been satisfied with that. As we move towards the metaverse and VR and AR, people want to create scenes with multiple assets for deployment. An active area being discussed in the working group is how we best implement multi glTF scenes and assets and how we link them. It will not be as sophisticated as USD since the focus is on transmission and delivery rather than authoring. But glTF will have something to enable multi-asset composition and linking in the next 12 to 18 months.

VentureBeat: How will glTF evolve to support more metaverse and digital twins use cases?

Trevett: We need to start bringing in things beyond just the physical appearance. We have geometry, textures and animations today in glTF 2.0. The current glTF does not say anything about physical properties, sounds, or interactions. I think a lot of the next generation of extensions for glTF will put in those kinds of behavior and properties. 

The industry is kind of deciding right now that it’s going to be USD and glTF going forward. Although there are older formats like OBJ, they are beginning to show their age. There are popular formats like FBX that are proprietary. USD is an open-source project and glTF is an open standard. People can participate in both ecosystems and help evolve them to meet their customer and market needs. I think both formats are going to kind of evolve side by side. Now the goal is to keep them aligned and keep this efficient distillation process between the two.