Unified Rendering LOD, Part 1 (from The Inner Product)

The Inner Product, March 2003
Jonathan Blow (jon@number-none.com)
Last updated 16 January 2004

Unified Rendering Level-of-Detail, Part 1

Related articles:

Rendering Level-of-Detail Forecast (August 2002)
Unified Rendering LOD, Part 2 (April 2003)
Unified Rendering LOD, Part 3 (May 2003)
Unified Rendering LOD, Part 4 (June 2003)
Unified Rendering LOD, Part 5 (July 2003)

Last year, in the article "Rendering Level-of-Detail Forecast" (August 2002), I discussed a few of the most popular methods of LOD management. I opined that most LOD schemes are too complicated for the benefit they provide, and I gave a high-level overview of the method I would choose to implement LOD management over large, generalized scenes.

Now, I'm going to put my money where my mouth is, and implement such a scheme. It's a big project; the resulting source code will be the biggest this column has seen, by far.

Goals of the System

We would like the system to reduce the number of triangles used to render distant objects, where precise detail is not necessary. Some algorithms, like the Lindstrom-Koller and ROAM descendants, do this by modifying the world mesh at a granularity of individual triangles. These algorithms are unacceptably slow at high detail levels. Also, many of these systems try to utilize frame coherence, which makes them unsuitable for interactive systems. (This may sound like a controversial statement because frame coherence is a standard tool in the graphics guy's optimization toolbox; but that is mainly because the culture of graphics optimization has its origin in tools for batch rendering, for film and similar applications. In games, we unquestioningly accepted frame coherence exploitation as a good thing; but it inherently produces unstable frame rates, and frame rate stability is a major priority for smooth interactivity. Look for an explanation of this, and a corresponding rant, in a future column.)

So, we want the system to operate at a large granularity, raising or lowering the detail of big chunks of geometry all at once. Thatcher Ulrich's appropriately-named "Chunked LOD" system works like this, and it runs very quickly. However, it uses the "binary triangle tree" tessellation, the same one that ROAM uses. This tessellation is very inefficient about how it allocates its triangles (though you won't find a serious efficiency analysis in any research paper promoting binary triangle trees!). Also, it means that regular-grid height fields are the only kind of geometry on which the algorithm can be easily used. With a lot of work, the algorithm can be extended to other topologies. But the amount of work necessary is large, it complicates the runtime system tremendously, and the system still won't handle arbitrary input meshes.

Thatcher chose the BTT topology to make it easy to fill the cracks between neighboring chunks of terrain. This is important because cracks look awful. We want to fill cracks correctly, and quickly, but without restricting the mesh topologies. Arbitrary mesh topology is important because we want to create a method of _unified_ LOD management, which can be used for expansive outdoor terrains, or small enclosed dungeons, or for animated characters.

Aside from topology, we want the system to place a minimal number of other constraints on mesh storage and rendering. An example of such constraints is seen with progressive meshes. It is a complicated and somewhat inefficient task to render a progressive mesh as a series of triangle strips; it is nigh impossible to render it as a series of vertex-cache-optimized triangle strips. If we introduce too many constraints, then when we come along to add new features to our engine, like stencil shadows or normal-mapping geometry enhancement, we may find that constraints of the new features clash with the old LOD constraints. This forces us to eliminate the new feature, or to dump our LOD system and make a new one. Either outcome is bad.

Finally, we want the system not to exhibit popping from one LOD to another. Popping looks ugly and is distracting to players; often we are looking for movement in our surroundings, and LOD popping creates false movement. Because we are choosing to adjust detail at a large mesh granularity, we will naturally get a lot of popping unless we spend significant effort to avoid it.

To summarize, I'll list all the main goals now as bullet points:

Reduce detail of distant geometry
Work at a large granularity
Don't produce visible popping
Don't produce cracks between chunks of geometry
Don't constrain mesh topology
Don't constrain the application's choice of mesh representation more than necessary
- Be adaptable to future techniques

Reducing the detail of distant geometry

My overall approach will be to cut the world geometry into pieces at preprocess time, and generate several detail levels for each piece. At runtime, I will choose an appropriate detail level for each piece and then render it, filling in the cracks.

To generate the various levels of detail, I will use Garland-Heckbert Error Quadric Simplification, or "EQS" (see For More Information). The input to EQS is a mesh of arbitrary topology, and it produces a mesh of arbitrary topology. We like that since it doesn't restrict us.

When reducing the mesh, I use what Garland and Heckbert call "subset placement": I pick a vertex, then drag it to the position of a nearby vertex, then remove all the triangles that have become degenerate. The alternative is "optimal placement", where you solve for the position of a new vertex, then move two vertices into that new position. Subset placement is easier to program than optimal placement, but it probably produces a lower-quality output triangulation. Nevertheless, I chose subset placement because of its extreme generality. The output vertices are a subset of the input vertices, which means that they comprise valid and undistorted mesh data. With optimal placement, you must interpolate and extrapolate to produce the values for each vertex (not just position, but texture coordinates, perhaps tangent frames, bone blending weights for animated meshes, or other arbitrary vertex-associated data). Linear interpolation might not be good enough, or might require specialized renormalization; extrapolated values might need to be bounded in application-specific ways. It is definitely possible to do these things, but there's still no guarantee that the machine-generated coordinates will look good. It costs a lot of extra work and software complexity, to gain a potentially small visual benefit. Also, since the work is highly dependant on the type of data stored at each vertex, it's difficult to make a generalized tool that operates in a "hands-off" manner.

To be clinical: the choice to use optimal placement creates extra dependencies between the mesh simplification tool and the data format. Since dependencies are the bane of software engineering and project management, it is best to use subset placement in almost all cases.

Dividing the geometry into pieces

For now, in order to make it easy to divide the world into blocks, and to simplify other tasks like crack-filling, we'll limit ourselves to rendering height fields. By the end of this series, though, we'll be operating on triangle soups. There's a fine line to walk here -- I want to make the system simple enough that I can progress quickly programming it, so I start with just height fields; but at the same time, I don't want this to be a permanent restriction, so I must be careful not to rely on techniques that cannot be extended beyond height fields. In other words, for every simplification I impose now, I need to have a believable story about how that piece of the system will be upgraded in the future. It's a sort of algorithmic bootstrapping, if you will.

With a height field, it's easy to divide the world: you have some big array of height samples for the entire world, and you copy out rectangular subsections of that array. Because the array is evenly-sampled, we can easily match up the vertices along the edges of the blocks, which is necessary for crack-filling (discussed later). Note that there is no limit on the size or aspect ratio of these subsections; you can choose them arbitrarily based on your needs. This contrasts starkly with the binary triangle tree algorithms, which want your blocks to be square and power-of-two-plus-one in size, which is often inconvenient.

In a future upgrade, to handle unrestricted input meshes that extend arbitrarily into all three dimensions, I'll clip the meshes against axis-aligned planes to divide them into a bunch of cube-shaped regions. When clipping triangles against planes, we create all the new vertices ourselves, and in fact we create them in pairs. By saving the information about which vertices correspond, we will make it easy to perform crack-filling. But that's a subject for a future article.

Operating at a large granularity

Suppose we choose subsections of the height field that are 21x21 samples. That gives us 20x20 quads, or 800 triangles per block. Suppose we use mesh simplification to generate low-res versions of the block at 400 triangles, 200, 100, 50, and 25. Based on the distance from each block to the camera, we choose one of these detail levels and render it. The result is shown in Figure 1a. It's bad because most of the rendered blocks consist of a small number of triangles. Current graphics hardware and drivers do not like that very much; the game will run slowly.

To solve this problem, we can hierarchically combine terrain blocks before putting them into the EQS routine to reduce them. Since we're working on a height field, I chose to combine 4 blocks at a time, prior to simplification. If the original blocks are 800 triangles each, I combine 4 of them to get a 3200-triangle block; then I simplify tha mesh back down to 800 triangles. Thus all the rendered blocks have the same number of triangles, regardless of scale, as you see in Figure 1b. This simplifies some of the math we'll look at later on.

With fully 3D input geometry, I would be combining 8 blocks into 1, requiring a more extreme reduction ratio. You may wish to think about the implications.


Figure 1a: The squares represent blocks of our world mesh; the numbers denote how many triangles make up each block. If we merely reduce the detail levels of a fixed set of blocks, we will inefficiently render blocks that contain small numbers of triangles.	Figure 1b: By combining blocks, we can eliminate this problem. Note that the total number of triangles in 1b is higher than in 1a; we will worry about this in a future article.

Filling Cracks

After rendering two blocks, we need to fill the gap between them by rendering an appropriate set of connecting triangles. I will call this set of triangles a "seam". Because all of our blocks are precomputed, we can also precompute the seams. At runtime we only render precomputed arrays, which is very fast.

Most people think of the crack-filling problem in a way that makes it needlessly difficult. They think about how, when given two arbitrary-LOD blocks, to match up the vertices along their edges. This is a difficult problem. Fortunately, though, we don't need to solve it. Instead, we can first build seams between the highest-LOD blocks. This is trivial; you just iterate along the array and generate a row of triangles (see Figure 2b). For each vertex of each seam triangle, you store an integer telling you which block the vertex is connected to, and another integer telling you the index of the vertex inside that block. So we're only storing indices, not spatial coordinates. That's important for the next step.

Suppose we reduce one of the blocks, as in Figure 2c. All we need is for the EQS routine to tell us which vertex in the source mesh corresponds to each vertex in the destination mesh. That's especially easy since we used subset placement; when we collapse a vertex into another vertex, we simply record the index of where it went. When detail reduction is complete, we return an array that tells us "For each index in the source mesh, here is the index of the corresponding vertex in the output mesh". Now we use this array to re-map the indices of the seam triangles, and we throw away any degenerate triangles. Now we have a valid precomputed seam for two blocks at differing resolutions. We can repeat this process as often as we want, reducing either block as much as we want. When we combine blocks together, we merge their seams. (Seams that end up interior to the block are tossed in with the block's regular geometry; seams on the borders are merged together to make bigger seams).

The seams between the highest-resolution blocks, which we generated from the original height field sampling pattern, are not necessary for rendering. If we were to render them, all their triangles would be collapsed to 0 area; Figure 2 is drawn with the blocks artifically pulled apart to make the filling pattern clear. So we throw those seams away after the preprocess, keeping only the seams that involve detail-reduced blocks.


Figure 2a: Two blocks of terrain, connected by a seam. The blocks are drawn with an exaggerated gap between them; in actual game rendering, they would be touching. The seam is drawn with intentionally strange coloring to show the individual triangles.

Figure 2b: A wireframe version of 2a.	Figure 2c: The right-hand block has been reduced to 1/4th its original triangle count. The seam is created by cross-referencing the seam of 2b through an index map provided by the mesh simplifier.

Now the question arises: how many of these seams do we need to precompute, and how do we organize them in memory? Clearly we need an appropriate seam between a given block and any of its possible neighbors. To reduce the number of possibilities, I decided to place a restriction on the LODs of rendered blocks: two neighboring blocks are not allowed to differ by more than one level of detail. So for any given block, all his neighbors will either be the same size, one-half his size (measuring just the length of an edge, not area), or double his size.

If every block stored all the seams for all its neighbors, we'd be storing every seam twice: both block A and block B would store the seam that attaches A to B. Instead, I store only seams for neighbors in the +X and +Y directions.

In the end, each block has eight seam pointers. There are 4 for the +X direction and 4 for the +Y. The four pointers are: seam to lower-resolution block, seam to same-resolution block, and two seams to the two higher-resolution neighbor blocks.

After all this seam-filling, there will still be some small holes in the terrain, where the corners of blocks meet, as seen in Figure 3. For the case of a height map, these can be filled very quickly at runtime. I won't explain the details now, since the procedure will change significantly when we adapt the algorithm to generalized meshes.

Figure 3: Three terrain blocks (gray with black borders) and the seam fills between them (red with dotted borders). Note the hole in the middle. These blocks are drawn with exaggerated gaps; the actual hole would be very small.

Sample Code

This month's sample code renders a height field with filled seams. It also does some work to eliminate popping. Next month we'll discuss popping in detail, and look at methods of choosing which LOD to use for a given block.