The Inner Product, February 2002
Jonathan Blow (jon@number-none.com)
Last updated 17 January 2004
Mathematical Growing Pains
Related articles:
None
This month we're going to talk about how to represent the sorts of linear
entities that we manipulate all the time, like lines and planes.
In high school I was taught that the equation (y = mx + b) is a groovy way to
represent a line in 2D. The equation is useful because the 'm' represents the
slope and the 'b' is the y-intercept -- that is, the line intersects the y axis
at (0, b). This representation is good if you don't have a lot of higher math
experience and you just want to draw a line on a piece of graph paper: 'b' gives
you a starting point, and 'm' gives you the direction to go from there.
Years passed until one day I was programming some pretty advanced 2D games; by
then I had used (y = mx + b) for visualization so often that I thought of it as
the primary way to talk about lines. So I tried make systems that represented
lines with two floating point numbers, 'm' and 'b'.
But what happens when a line is vertical? Its slope is undefined. In that case,
in high school you'd just write (x = k), which seemed simple enough. But when
you start writing games, you have to think about more complex situations, like
lines that are smoothly rotating from frame to frame. And you're writing code
that uses limited-precision numbers, so your computations become numerically
ill-conditioned when the lines are steep, because 'm' is such a huge number. To
fix this, you put a bunch of 'if' statements into your code to change the
computation based on what neighborhood the slope is in. That sucks from a
software engineering standpoint, and the computational discontinuities (these
happen as your parameters cross from one 'if' case into another) may cause
subtle but disturbing things to happen in your game. See the pseudocode in
Listing 1.
These problems go away when you make a simple mental adjustment and use (ax + by
+ c = 0) as your line equation. This is like the slope/intercept equation, but
before a division has taken place: if you divide (ax + by + c = 0) by b (the 'b'
from this equation, not the 'b' we were talking about before), you get the
slope/intercept form. The slope and intercept shoot toward infinity when b is
near 0, meaning the line is vertical. It is this division that causes problems
with the slope/intercept form, so using (ax + by + c = 0) solves those problems.
As a bonus, the surface normal of the line is (a, b), and the distance from the
line to the origin is 'c'. You can easily read these features out of the
equation, and being a game developer you're more likely to care about these
things than the y-intercept. Though we now need 3 floating point numbers to talk
about our line (a, b, and c), that extra number buys convenience and software
reliability. The reliability comes because the precision of our computations is
more isotropic, it doesn't care so much what direction the line goes in.
To sum it up, my learning of (y = mx + b) as the way to talk about lines had
impacted my effectievness at making games; the alternate representation removed
those blockades.
Listing 1: Pseudocode for the kind of thing that happens when you try to represent 2D lines in slope/intercept form. This is an example of how a singularity in mathematical representation affects code; more complex (and less dumb) versions of this happen with systems of simultaneous equations.
struct Line { bool
intersect_with_vertical_line(Line *vertical, Line *non_vertical, float *x_result,
float *y_result) { bool
intersect_nonvertical_lines(Line *line_1, Line *line_2, float *x_result,
float *y_result) { bool
intersect_lines(Line *line_1, Line *line_2, float *x_result, float *y_result)
{
if (line_2->is_vertical) {
return intersect_nonvertical_lines(line_1, line_2, x_result, y_result); |
Extending Lines to 3D
After a while, I'd made enough 2D games, and decided to try 3D.
When I first tried to formulate line equations in 3D, I got confused. In 2D (ax
+ by + c = 0) had been the best thing since sliced bread, so clearly I wanted to
extend that equation to 3D. The obvious candidate is (ax + by + cz + d = 0). But
wait a minute... I knew from reading books that this was the equation for a
plane. But extending my line equation to 3D requires adding 'z' in somehow,
right? And how else could I possibly add a 'z' that would make any sense?
The problem is that (ax + by + c = 0), which I'd thought was an enlightened way
of representing a line, is not a line equation at all -- and neither is (y = mx
+ b), for that matter. It's a plane equation, and it only worked because lines
and hyperplanes in 2D are the same thing (where my temporary definition of a
hyperplane is "that which divides space into two halves".)
There is an equation that works for all lines regardless of the space's
dimension. It is (L = p0 + vt), where L represents the set of points
comprising the line, p0 is an arbitrary point known to lie on the
line, v is the direction vector that the line travels in, and t is the "time"
parameter. When we get used to thinking about lines this way, we build up
intuition that is valid no matter how many dimensions we're dealing with. We say
that this is the "parametric form" of the line, as varying the parameter t will
give you every point in L. If n is the dimensionality of your space, then this
equation requires 2n numbers worth of storage if you're being lackadaisical
about it, or 2n-1 if you're being hardcore.
Simultaneous Equations?
So why is ax + by + etc the equation of a hyperplane, and not a line? It's
because it takes n degrees of freedom (represented by the coordinate variables
x, y, ...) and, by binding them together with the equal sign, places one
constraint on that system of variables. This linear constraint removes one
dimension; it flattens the space in the direction of the gradient of the
equation (this gradient is the same thing as the normal of the hyperplane). The
resulting space has n-1 dimensions: in 2D, you get a line; in 3D, a plane; in
4D, you get a 3D hyperplane.
Suppose we didn't want to use the parameteric form for a line in n dimensions.
Instead, we could represent the line by starting with the full n-dimensional
space and squashing it n-1 times, because n-(n-1) is 1, the dimensionality of a
line. We can do this using n-1 linear equations simultaneously. Simultaneous
linear equations are the same thing as a matrix, so we're storing an n by n-1
matrix. This uses a lot of storage space, and furthermore, it's not guaranteed
to behave nicely. Suppose two of our equations try to squish the space in the
same direction. After the first equation acts, there's nothing left for the
second one to do; so the second equation doesn't reduce the space by a dimension
(in fact it leaves it unchanged). After all our n-1 squashings, the remaining
entity will have one more dimension than we expected; instead of a line, it will
be a 2D plane.
When this kind of thing happens, we need to break out some advanced linear
algebra to deal with the situation. Naive game programmer code, just consisting
of a big hand-derived vector equation worked out on paper, will end up dividing
by zero somewhere and freaking out. More experienced programmers might use a
matrix equation, but black-box matrix methods get screwy too; we end up with a
situation like the determinant of a matrix being 0 and us wanting to invert it.
The matrix has no inverse; badly written code tries to invert it anyway, and
thus produces inaccurate results or NaNs. Better matrix code takes stock of the
situation with an 'if' statement and, if the determinant is within some epsilon
of 0, it reduces the dimensionality of the matrix and solves a reduced-dimension
problem. But picking suitable epsilons is not easy, and numerical
discontinuities are introduced by the 'if' statement.
All this should sound familiar from an engineering standpoint -- it's the kind
of thing we were doing with (y = mx + b) when the line became vertical, and all
the same problems arise.
Cases of determinant 0 are often called "degenerate" but I think they are quite
natural and inability to deal with them indicates weak methodology. Imagine that
you have 3 different planes that pass through the origin, rotating freely. You
want to find the intersection of those planes. Most of the time, they intersect
in a point; but if two of the planes coincide, then all three intersect in a
line; and if all three coincide, the answer is a plane.
To solve this intersection problem using beginner's linear algebra, we write a
matrix equation p = A-1d that finds the solution; but hardcoded into
this equation is the assumption that the answer is a point. When the answer is
not a point, A has determinant 0, so the equation is unsolvable.
But what's the big deal? Sometimes planes are coplanar, just like sometimes
lines are vertical. Why should that be a problem?
The problem goes away when we stop treating matrices as black boxes that we want
to invert, and instead start decomposing them and looking at their intrinsic
properties. The QR and Singular Value decompositions become useful to us here.
Common Mathematical Misconceptions
I started this article with the question of how to represent a line. As 3D
programmers we get past these problems early on, if only because we can't do
lines in 3D otherwise. About the varying representations of a line, I want to
develop an analogy: they are like other concepts that we work with from day to
day, rooted into the core of our thinking, that are misleading in 3D and don't
even work in higher dimensions. I'll try to nail the biggest ones below.
The Axis of Rotation
When learning 3D math, once we get past the inconvenience of Euler angles, we
find that all rotations can be represented by an axis vector, about which we
rotate, and an angle, representing the magnitude of the rotation. Perhaps we
visualize a rotation as a wheel turning on an oriented axle.
The problem is that the whole concept of "axis of rotation" only works in 3D.
In 2D, rotations occur around a central point, and maybe we think of a
nonexistent axis sticking out of the plane to help us visualize this. But a much
more reasonable way to think of rotations is to speak of the "plane of rotation"
rather than the axis. In n dimensions, any rotation occurs within a
2-dimensional plane, and the object rotates "around" however many dimensions are
left in the space. In 2D space, you rotate around a 0-dimensional subspace, a
central point. In 3D, you rotate around a 1-dimensional vector subspace. In 4D,
you rotate around a 2-dimensional planar subspace.
(In 3D the surface normal of the plane of rotation is the axis vector we are
used to thinking about. In higher dimensions, using this definition of rotation,
it's no longer true that you can reproduce an arbitrary orientation with only
one rotation. Hestenes calls these "simple rotations".)
I tend to think of rotation as "the thing that binds together any two dimensions
of our space". In 3D, we have 3 canonical planes of rotation: the XY plane,
which binds things that leave X to entering Y; and likewise for YZ and XZ. Any
rotation occurs within a 2D plane that is a linear combination of these 3
canonical planes. In 4D, there are 6 such canonical planes.
The Cross Product
The cross product is a fundamental piece of 3D math that we use all the time.
But we were taught incorrectly what the cross product is and how it works, with
the result that we often use it improperly, in subtle ways.
We are usually taught only about the cross product in 3D. But what is the cross
product in 4 dimensions and higher? Does the concept even make sense? Because 2
linearly independent vectors determine a 2D plane, it is possible for us to
interpret the results of the cross product in n dimensions as the subspaces we
were just rotating around, a few paragraphs ago: in 2D the result is a scalar,
in 3D a vector, in 4D a 2D-planar thing.
Following this scheme, when we take the cross product in 3D, we think of it as
returning a vector result. Unfortunately, this is wrong, and we see this in a
few places. A prominent symptom is that "surface normal vectors" can't be
transformed in the same way as plain vanilla vectors can; if you are
transforming vectors by some transformation M, you need to transform normals by
(MT)-1. Beginning 3D programmers may not run into this
problem, because if M is just a rotation, its inverse is equal to its transpose:
(MT)-1 = M.
This difference in transformations is necessary because the cross product is
weird. We are providing two vectors as arguments of the cross product, and those
vectors determine a 2-plane if they are not colinear. But the cross product
implicitly takes the dual of the 2-plane, its normal vector, and returns that to
us. So we think we're talking about a vector, but we're really talking about a
plane through the origin. The plane occupies whichever 2 dimensions its normal
vector does not; because of this, transformations can affect the plane in ways
that we would not see if we considered its normal vector in isolation. Both
shearing and nonuniform scaling will do this.
To ensure that we always pick the right transformation, we can say that the
output of the cross product is a thing called a "form", which one might think of
as a transposed vector. The form interacts with matrices in all the ways you'd
expect a row vector to behave. Smart physicists have been dealing with the
differences between point-like and plane-like things for a long time; eventually
someone invented Einstein Index Notation, which helps disambiguate things. Jim
Blinn wrote two articles that discuss the Einstein notation from a graphics
programmer's point of view (reference). But this whole tensor algebra approach
gets pretty complicated, so some new school physicists are evangelizing Clifford
algebra (also known as geometric algebra) as a method of simplification.
Clifford algebra defines the _wedge product_ of two vectors in a way that is
similar to the cross product, but it returns a non-vector result; that result is
a plane-like thing called a bivector. You can take the wedge product of a
bivector and another vector to get a volumetric trivector, and so on. The
Clifford product of two vectors gives you a result containing both scalar and
bivector parts; it is the dot product and cross product all wrapped together.
This unification enables us to do things that make life easier, like dividing an
equation by a vector or a plane.
In some references the 2D version of the cross product is called the "perp-dot
product" (See F.S. Hill's Graphics Gem below). Pertti Lounesto's book describes
higher-dimension cross products that are different from the one I've mentioned
here.
Why We Should Care About N-Dimensional Generality
Recently, to generate levels of detail for humanoid character meshes, I was
implementing Garland/Heckbert Error Quadric Simplification (reference). The
basic version of this algorithm, which only takes mesh geometry into account,
operates on 3D vectors; it uses 3D plane equations that are derived and
evaluated using the cross product and the dot product. But to take vertex color
and texture coordinates into account, we need to generalize the algorithm to
higher dimensions.
We hit a wall when we try to move the algorithm to higher dimensions, because
each face of our mesh imposes a 2-dimensional constraint on the quadric error
metric. When we're in 3 dimensions, this constraint can be represented as the
hyperplane (ax + by + cz + d = 0), which we're used to playing with. But when we
go up to 5 dimensions (3 spatial dimensions + 2 texture coordinates per vertex),
we no longer have such a tidy hyperplane equation to represent what's going on.
Each face of the mesh defines a 2D plane, but now a 2D plane is just a small
strand in the 5D space, so we need to represent it parametrically. This is
exactly analogous to the way (ax + by + c = 0) stopped working for lines when we
jumped from 2D to 3D.
Another way of looking at the problem is this: in 3D we usually get a plane from
two vectors by taking the cross product. But if we're not conversant in advanced
linear algebra, it is unclear how to perform this process in 5D. So be sure to
eat your multidimensional Wheaties.
In their paper, when the time comes to elevate above 3 dimensions, Garland and
Heckbert shift gears away from the hyperplane approach and re-derive their
algorithm in a different way. But if you start with an all-encompassing approach
(like Clifford algebra) from the beginning, the algorithm works no matter what
dimension you deal with, and you never have to get confused or change your mode
of thought. You also end up with a shorter derivation than that used in the
Garland-Heckbert paper.
So the traditional tools of 3D vector math definitely hinder us in these kinds
of pursuits, and broader approaches can help us. I must emphasize that Garland-Heckbert
is not an obscure algorithm; it's one of the best, simplest, and most
widely-used methods of performing mesh simplification.
I've mentioned Clifford Algebra a lot, but I haven't shown any concrete examples
of it yet. Oops, looks like we're out of space for this month. Pertti Lounesto's
book is a good introduction, as is Leo Dorst's web page and tutorial software.
References
Michael Garland and Paul S. Heckbert, "Simplifying Surfaces with Color and
Texture Using Quadric Error Metrics", Proceedings of IEEE Visualization 1998.
http://graphics.cs.uiuc.edu/~garland/research/quadrics.html
Pertti Lounesto, "Clifford Algebras and Spinors", London Mathematical Society
Lecture Note Series #239, Cambridge University Press, 1997.
Leo Dorst, "GABLE: A Matlab Geometric Algebra Tutorial",
http://carol.wins.uva.nl/~leo/clifford/gable.html
David Hestenes and Garret Sobczyk, Clifford Algebra to Geometric Calculus: A
Unified Language for Mathematics and Physics, Kluwer Academic Publishers, 1984.
Jim Blinn, "Uppers and Downers", Jim Blinn's Corner: Dirty Pixels, Morgan
Kaufmann, 1998.
F.S. Hill Jr., "The Pleasures of 'Perp Dot' Products", in Graphics Gems IV,
Paul Heckbert, ed., Academic Press, 1994.
Acknowledgements
Thanks to Chris Hecker for redirecting the overly negative energy that
originally permeated this article concept, and for some pointers regarding
matrix decomposition.