Mathematical Growing Pains (from The Inner Product)

The Inner Product, February 2002
Jonathan Blow (jon@number-none.com)
Last updated 17 January 2004

Mathematical Growing Pains

Related articles:

None

This month we're going to talk about how to represent the sorts of linear entities that we manipulate all the time, like lines and planes.

In high school I was taught that the equation (y = mx + b) is a groovy way to represent a line in 2D. The equation is useful because the 'm' represents the slope and the 'b' is the y-intercept -- that is, the line intersects the y axis at (0, b). This representation is good if you don't have a lot of higher math experience and you just want to draw a line on a piece of graph paper: 'b' gives you a starting point, and 'm' gives you the direction to go from there.

Years passed until one day I was programming some pretty advanced 2D games; by then I had used (y = mx + b) for visualization so often that I thought of it as the primary way to talk about lines. So I tried make systems that represented lines with two floating point numbers, 'm' and 'b'.

But what happens when a line is vertical? Its slope is undefined. In that case, in high school you'd just write (x = k), which seemed simple enough. But when you start writing games, you have to think about more complex situations, like lines that are smoothly rotating from frame to frame. And you're writing code that uses limited-precision numbers, so your computations become numerically ill-conditioned when the lines are steep, because 'm' is such a huge number. To fix this, you put a bunch of 'if' statements into your code to change the computation based on what neighborhood the slope is in. That sucks from a software engineering standpoint, and the computational discontinuities (these happen as your parameters cross from one 'if' case into another) may cause subtle but disturbing things to happen in your game. See the pseudocode in Listing 1.

These problems go away when you make a simple mental adjustment and use (ax + by + c = 0) as your line equation. This is like the slope/intercept equation, but before a division has taken place: if you divide (ax + by + c = 0) by b (the 'b' from this equation, not the 'b' we were talking about before), you get the slope/intercept form. The slope and intercept shoot toward infinity when b is near 0, meaning the line is vertical. It is this division that causes problems with the slope/intercept form, so using (ax + by + c = 0) solves those problems.

As a bonus, the surface normal of the line is (a, b), and the distance from the line to the origin is 'c'. You can easily read these features out of the equation, and being a game developer you're more likely to care about these things than the y-intercept. Though we now need 3 floating point numbers to talk about our line (a, b, and c), that extra number buys convenience and software reliability. The reliability comes because the precision of our computations is more isotropic, it doesn't care so much what direction the line goes in.

To sum it up, my learning of (y = mx + b) as the way to talk about lines had impacted my effectievness at making games; the alternate representation removed those blockades.

Listing 1:

Pseudocode for the kind of thing that happens when you try to represent 2D lines in slope/intercept form. This is an example of how a singularity in mathematical representation affects code; more complex (and less dumb) versions of this happen with systems of simultaneous equations.

struct Line {
    float slope, y_intercept; // 'slope' == m, 'y_intercept' == b
    bool is_vertical;         // or else declare this, meaning the above are invalid
    float x_value;            // used only if the line is vertical.
};

bool intersect_with_vertical_line(Line *vertical, Line *non_vertical, float *x_result, float *y_result) {
     *x_result = vertical->x_value;
     *y_result = non_vertical->slope * vertical->x_value + non_vertical->y_intercept;
     return true;
}

bool intersect_nonvertical_lines(Line *line_1, Line *line_2, float *x_result, float *y_result) {
     // Hope this denominator is not small.
     *x_result = (line_2->slope - line_1->slope) / (line_2->y_intercept - line_1->y_intercept);
     // Choice of line_1 below is arbitrary, hope we're well-conditioned.
     *y_result = line_1->slope * (*x_result) + line_1->y_intercept;
     return true;
}

bool intersect_lines(Line *line_1, Line *line_2, float *x_result, float *y_result) {
    if (line_1->is_vertical) {
        if (line_2->is_vertical) return false;
        return intersect_with_vertical_line(line_1, line_2, x_result, y_result);
    }

    if (line_2->is_vertical) {
        return intersect_with_vertical_line(line_2, line_1, x_result, y_result);
    }

return intersect_nonvertical_lines(line_1, line_2, x_result, y_result);
}

Extending Lines to 3D

After a while, I'd made enough 2D games, and decided to try 3D.

When I first tried to formulate line equations in 3D, I got confused. In 2D (ax + by + c = 0) had been the best thing since sliced bread, so clearly I wanted to extend that equation to 3D. The obvious candidate is (ax + by + cz + d = 0). But wait a minute... I knew from reading books that this was the equation for a plane. But extending my line equation to 3D requires adding 'z' in somehow, right? And how else could I possibly add a 'z' that would make any sense?

The problem is that (ax + by + c = 0), which I'd thought was an enlightened way of representing a line, is not a line equation at all -- and neither is (y = mx + b), for that matter. It's a plane equation, and it only worked because lines and hyperplanes in 2D are the same thing (where my temporary definition of a hyperplane is "that which divides space into two halves".)

There is an equation that works for all lines regardless of the space's dimension. It is (L = p₀ + vt), where L represents the set of points comprising the line, p₀ is an arbitrary point known to lie on the line, v is the direction vector that the line travels in, and t is the "time" parameter. When we get used to thinking about lines this way, we build up intuition that is valid no matter how many dimensions we're dealing with. We say that this is the "parametric form" of the line, as varying the parameter t will give you every point in L. If n is the dimensionality of your space, then this equation requires 2n numbers worth of storage if you're being lackadaisical about it, or 2n-1 if you're being hardcore.

Simultaneous Equations?

So why is ax + by + etc the equation of a hyperplane, and not a line? It's because it takes n degrees of freedom (represented by the coordinate variables x, y, ...) and, by binding them together with the equal sign, places one constraint on that system of variables. This linear constraint removes one dimension; it flattens the space in the direction of the gradient of the equation (this gradient is the same thing as the normal of the hyperplane). The resulting space has n-1 dimensions: in 2D, you get a line; in 3D, a plane; in 4D, you get a 3D hyperplane.

Suppose we didn't want to use the parameteric form for a line in n dimensions. Instead, we could represent the line by starting with the full n-dimensional space and squashing it n-1 times, because n-(n-1) is 1, the dimensionality of a line. We can do this using n-1 linear equations simultaneously. Simultaneous linear equations are the same thing as a matrix, so we're storing an n by n-1 matrix. This uses a lot of storage space, and furthermore, it's not guaranteed to behave nicely. Suppose two of our equations try to squish the space in the same direction. After the first equation acts, there's nothing left for the second one to do; so the second equation doesn't reduce the space by a dimension (in fact it leaves it unchanged). After all our n-1 squashings, the remaining entity will have one more dimension than we expected; instead of a line, it will be a 2D plane.

When this kind of thing happens, we need to break out some advanced linear algebra to deal with the situation. Naive game programmer code, just consisting of a big hand-derived vector equation worked out on paper, will end up dividing by zero somewhere and freaking out. More experienced programmers might use a matrix equation, but black-box matrix methods get screwy too; we end up with a situation like the determinant of a matrix being 0 and us wanting to invert it. The matrix has no inverse; badly written code tries to invert it anyway, and thus produces inaccurate results or NaNs. Better matrix code takes stock of the situation with an 'if' statement and, if the determinant is within some epsilon of 0, it reduces the dimensionality of the matrix and solves a reduced-dimension problem. But picking suitable epsilons is not easy, and numerical discontinuities are introduced by the 'if' statement.

All this should sound familiar from an engineering standpoint -- it's the kind of thing we were doing with (y = mx + b) when the line became vertical, and all the same problems arise.

Cases of determinant 0 are often called "degenerate" but I think they are quite natural and inability to deal with them indicates weak methodology. Imagine that you have 3 different planes that pass through the origin, rotating freely. You want to find the intersection of those planes. Most of the time, they intersect in a point; but if two of the planes coincide, then all three intersect in a line; and if all three coincide, the answer is a plane.

To solve this intersection problem using beginner's linear algebra, we write a matrix equation p = A^-1d that finds the solution; but hardcoded into this equation is the assumption that the answer is a point. When the answer is not a point, A has determinant 0, so the equation is unsolvable.

But what's the big deal? Sometimes planes are coplanar, just like sometimes lines are vertical. Why should that be a problem?

The problem goes away when we stop treating matrices as black boxes that we want to invert, and instead start decomposing them and looking at their intrinsic properties. The QR and Singular Value decompositions become useful to us here.

Common Mathematical Misconceptions

I started this article with the question of how to represent a line. As 3D programmers we get past these problems early on, if only because we can't do lines in 3D otherwise. About the varying representations of a line, I want to develop an analogy: they are like other concepts that we work with from day to day, rooted into the core of our thinking, that are misleading in 3D and don't even work in higher dimensions. I'll try to nail the biggest ones below.

The Axis of Rotation

When learning 3D math, once we get past the inconvenience of Euler angles, we find that all rotations can be represented by an axis vector, about which we rotate, and an angle, representing the magnitude of the rotation. Perhaps we visualize a rotation as a wheel turning on an oriented axle.

The problem is that the whole concept of "axis of rotation" only works in 3D.

In 2D, rotations occur around a central point, and maybe we think of a nonexistent axis sticking out of the plane to help us visualize this. But a much more reasonable way to think of rotations is to speak of the "plane of rotation" rather than the axis. In n dimensions, any rotation occurs within a 2-dimensional plane, and the object rotates "around" however many dimensions are left in the space. In 2D space, you rotate around a 0-dimensional subspace, a central point. In 3D, you rotate around a 1-dimensional vector subspace. In 4D, you rotate around a 2-dimensional planar subspace.

(In 3D the surface normal of the plane of rotation is the axis vector we are used to thinking about. In higher dimensions, using this definition of rotation, it's no longer true that you can reproduce an arbitrary orientation with only one rotation. Hestenes calls these "simple rotations".)

I tend to think of rotation as "the thing that binds together any two dimensions of our space". In 3D, we have 3 canonical planes of rotation: the XY plane, which binds things that leave X to entering Y; and likewise for YZ and XZ. Any rotation occurs within a 2D plane that is a linear combination of these 3 canonical planes. In 4D, there are 6 such canonical planes.

The Cross Product

The cross product is a fundamental piece of 3D math that we use all the time. But we were taught incorrectly what the cross product is and how it works, with the result that we often use it improperly, in subtle ways.

We are usually taught only about the cross product in 3D. But what is the cross product in 4 dimensions and higher? Does the concept even make sense? Because 2 linearly independent vectors determine a 2D plane, it is possible for us to interpret the results of the cross product in n dimensions as the subspaces we were just rotating around, a few paragraphs ago: in 2D the result is a scalar, in 3D a vector, in 4D a 2D-planar thing.

Following this scheme, when we take the cross product in 3D, we think of it as returning a vector result. Unfortunately, this is wrong, and we see this in a few places. A prominent symptom is that "surface normal vectors" can't be transformed in the same way as plain vanilla vectors can; if you are transforming vectors by some transformation M, you need to transform normals by (M^T)^-1. Beginning 3D programmers may not run into this problem, because if M is just a rotation, its inverse is equal to its transpose: (M^T)^-1 = M.

This difference in transformations is necessary because the cross product is weird. We are providing two vectors as arguments of the cross product, and those vectors determine a 2-plane if they are not colinear. But the cross product implicitly takes the dual of the 2-plane, its normal vector, and returns that to us. So we think we're talking about a vector, but we're really talking about a plane through the origin. The plane occupies whichever 2 dimensions its normal vector does not; because of this, transformations can affect the plane in ways that we would not see if we considered its normal vector in isolation. Both shearing and nonuniform scaling will do this.

To ensure that we always pick the right transformation, we can say that the output of the cross product is a thing called a "form", which one might think of as a transposed vector. The form interacts with matrices in all the ways you'd expect a row vector to behave. Smart physicists have been dealing with the differences between point-like and plane-like things for a long time; eventually someone invented Einstein Index Notation, which helps disambiguate things. Jim Blinn wrote two articles that discuss the Einstein notation from a graphics programmer's point of view (reference). But this whole tensor algebra approach gets pretty complicated, so some new school physicists are evangelizing Clifford algebra (also known as geometric algebra) as a method of simplification.

Clifford algebra defines the _wedge product_ of two vectors in a way that is similar to the cross product, but it returns a non-vector result; that result is a plane-like thing called a bivector. You can take the wedge product of a bivector and another vector to get a volumetric trivector, and so on. The Clifford product of two vectors gives you a result containing both scalar and bivector parts; it is the dot product and cross product all wrapped together. This unification enables us to do things that make life easier, like dividing an equation by a vector or a plane.

In some references the 2D version of the cross product is called the "perp-dot product" (See F.S. Hill's Graphics Gem below). Pertti Lounesto's book describes higher-dimension cross products that are different from the one I've mentioned here.

Why We Should Care About N-Dimensional Generality

Recently, to generate levels of detail for humanoid character meshes, I was implementing Garland/Heckbert Error Quadric Simplification (reference). The basic version of this algorithm, which only takes mesh geometry into account, operates on 3D vectors; it uses 3D plane equations that are derived and evaluated using the cross product and the dot product. But to take vertex color and texture coordinates into account, we need to generalize the algorithm to higher dimensions.

We hit a wall when we try to move the algorithm to higher dimensions, because each face of our mesh imposes a 2-dimensional constraint on the quadric error metric. When we're in 3 dimensions, this constraint can be represented as the hyperplane (ax + by + cz + d = 0), which we're used to playing with. But when we go up to 5 dimensions (3 spatial dimensions + 2 texture coordinates per vertex), we no longer have such a tidy hyperplane equation to represent what's going on. Each face of the mesh defines a 2D plane, but now a 2D plane is just a small strand in the 5D space, so we need to represent it parametrically. This is exactly analogous to the way (ax + by + c = 0) stopped working for lines when we jumped from 2D to 3D.

Another way of looking at the problem is this: in 3D we usually get a plane from two vectors by taking the cross product. But if we're not conversant in advanced linear algebra, it is unclear how to perform this process in 5D. So be sure to eat your multidimensional Wheaties.

In their paper, when the time comes to elevate above 3 dimensions, Garland and Heckbert shift gears away from the hyperplane approach and re-derive their algorithm in a different way. But if you start with an all-encompassing approach (like Clifford algebra) from the beginning, the algorithm works no matter what dimension you deal with, and you never have to get confused or change your mode of thought. You also end up with a shorter derivation than that used in the Garland-Heckbert paper.

So the traditional tools of 3D vector math definitely hinder us in these kinds of pursuits, and broader approaches can help us. I must emphasize that Garland-Heckbert is not an obscure algorithm; it's one of the best, simplest, and most widely-used methods of performing mesh simplification.

I've mentioned Clifford Algebra a lot, but I haven't shown any concrete examples of it yet. Oops, looks like we're out of space for this month. Pertti Lounesto's book is a good introduction, as is Leo Dorst's web page and tutorial software.

References

Michael Garland and Paul S. Heckbert, "Simplifying Surfaces with Color and Texture Using Quadric Error Metrics", Proceedings of IEEE Visualization 1998. http://graphics.cs.uiuc.edu/~garland/research/quadrics.html
Pertti Lounesto, "Clifford Algebras and Spinors", London Mathematical Society Lecture Note Series #239, Cambridge University Press, 1997.
Leo Dorst, "GABLE: A Matlab Geometric Algebra Tutorial", http://carol.wins.uva.nl/~leo/clifford/gable.html
David Hestenes and Garret Sobczyk, Clifford Algebra to Geometric Calculus: A Unified Language for Mathematics and Physics, Kluwer Academic Publishers, 1984.
Jim Blinn, "Uppers and Downers", Jim Blinn's Corner: Dirty Pixels, Morgan Kaufmann, 1998.
F.S. Hill Jr., "The Pleasures of 'Perp Dot' Products", in Graphics Gems IV, Paul Heckbert, ed., Academic Press, 1994.

Acknowledgements

Thanks to Chris Hecker for redirecting the overly negative energy that originally permeated this article concept, and for some pointers regarding matrix decomposition.