Braid Particle System Revision (part 2)

Previously, discussed the particle systems in Braid. They need to be generated differently from the way most games do it; this created more difficulty with regard to random number generation than games usually have.

Back when Braid shipped, I had determined that a more-random seed would give me better behavior; so I got a more-random seed by pre-generating a table of high-quality random numbers and indexing that table based on the ID of a particle source. This seemed messy, so a couple of weeks ago I replaced the existing LCG random number generator with PCG, which is higher-quality but slower. This resulted in code that was more elegant, but the solution seems like overkill. So now I want to go back and re-solve the quality issue in a way that is elegant but also fast.

Here are a few possible approaches:

(A) Go back to BCPL’s LCG as the random-generation scheme, and just find a simpler way to generate a seed that produces better results, essentially by hashing the particle index in some way. Such a hash should be fast and simple. (You can think of the table full of Mersenne Twister-generated numbers as a precomputed hash; it’s a little weak due to the limited table size, and it also involves storage. We should be able to come up with something simpler.)

(B) PCG provides the feature of being able to fast-forward some number of steps. In the PCG paper, it is mentioned that all LCGs have this property, it’s just not commonly appreciated. This means we could figure out how to implement fast-forwarding for our old LCG, which then means we would not have to re-seed it per particle; we just seed it once per source, then fast-forward to the particular particle we want. Since we are no longer re-seeding, this should eliminate the associated quality loss.

(C) Use a different scheme for generating random-ish numbers that is more amenable to arbitrary seeking within the stream. For example, Marc Reynolds wrote this simple library for random-access pseudorandom numbers using Weyl sequences. At first glance, this looks like it could be applicable.

I tried option (A), and it worked great. In the next blog post, I’ll go over details of what I did there. But let’s explore our options more fully, and talk about (B) and (C).

Option (B): Fast-Forwarding

In Fabian’s guest post we get a pretty good idea of what would be involved in a fast-forwarding scheme. He goes way beyond the basics; for the purposes of understanding how to use this in Braid, you can stop around the middle of the posting. It’s clear: we don’t have to re-seed the LCG for every particle; instead, we could start at a known good seed and then fast-forward to a particular particle index. The downside is that fast-forwarding is going to be more complicated than seeding (the iteration to calculate the right exponent is a little complicated, and the table-lookup requires the existence of the table which is also a complication). You might think that performance here would also be bad, but we only need to fast-forward once per particle source; after that, we just step once per particle. So the cost of the fast-forward is amortized across the average number of particles per source. Also, you’re reading from a table, but that table is small and constant for all particle sources, so if you process all your particle sources at once, the performance hit from this table should be very small.

Fast-forwarding invites code fragility, though. The number of steps by which you’d want to skip forward is the index of a particle times the number of random numbers generated for each particle. Because code changes over time, you can be almost guaranteed that some new random numbers will be added to per-particle generation eventually, which will badly break the appearance of particles in a way that will confuse anyone who isn’t intimately familiar with the technique. To prevent this, we can be conservative and reserve more random numbers per particle than we actually need right now; say, 100. We probably won’t ever reach that limit, but now that limit has to be documented in the code, and that’s a complication. If we ever do exceed that limit, we probably aren’t going to detect the situation automatically (because adding code to check this at runtime is going to be slow, and there’s no good way to check it at compile time in C++), so again, the symptom is that our particles mysteriously break in a confusing way.

Also, suppose I want to extrapolate a particle system backward in time to before its time of creation. With the current per-particle seeds, this just works automatically. With a fast-exponentiation scheme, we would have to handle negative exponents somehow; maybe that can be done quickly, but at first glance it seems shaky. This would also complicate the code. So instead we can say that all particle sources begin, not at particle #0, but at particle number one million or something, and we can go backward in time up to one million particles earlier than that source’s official t = 0. But again this creates a limit in the code that we might one day exhaust unexpectedly. These kinds of limits are prominent reason why software is often overcomplicated and broken.

For these reasons, I decided not to implement a fast-forwarding scheme. I regard fast-forwarding as useful to understand for my own personal edification, but undesirable as a way to implement anything for Braid (unless other approaches end up causing bigger problems, somehow!)

Option (C): Generators That Support Constant-Time Seeking

If we really needed random-access navigation through a stream of pseudorandom numbers, the Marc Reynolds code linked above would be an interesting thing to try. However, if option (A) works (which, again, it did, as we’ll see next time), then we would only want to adapt this method if it offers benefits for this particular application.

Browsing through the code, it’s clear that it is slower than our LCG (which is not surprising, since LCGs are fairly minimal). This code performs 64-bit integer operations, rather than the 32-bit operations we are using now, so it is going to be slower (or possibly just use more ALU resources, generating more heat or increasing battery usage, etc). Also, the mixing step is more expensive than the mixing step from our LCG.

Looking at the math here, the reason this generator is easily seekable is just because each step only changes the state by adding one constant to it, so moving N steps is the equivalent of adding N times that constant. Note that LCGs have two constants, a multiplicative and an additive constant. So this is equivalent to an LCG where the multiplicative constant is 1. Thus this generator seems like it could be strictly lower-powered in its ability to generate random-seeming results. As Reynolds says in the documentation for this code,

The state update is so simple that the burden of producing a sequence with statistical random properties is on the shoulders of the bit finalizing function.

Essentially, the stepping part of the generator is just adding a really really big number with a scattered bit pattern as a way of giving us reasonably-well differing bit patterns to feed into the mixing step, and the mixing step is doing most of the work. This explains why it’s more expensive than a typical mixing step for an LCG.

If you were to carry this idea further: with a good enough mixing function (say in the extreme a cryptography-grade hash function), you wouldn’t even need the Weyl sequence – you could just feed in the particle index and get out a very random-seeming sequence of bits. This is conceptually simple but in practice is going to be much more expensive. Part of the magic of an LCG is that it gives you something so random-seeming with so little work. Cryptographic hashes, on the other hand, are willing to do a great deal of work in order to make sure things are as close to random as possible.

Anyway, since we don’t really need to seek to arbitrary indices, I did not see a justification in going this way. It is good to keep in mind, though, since it may come in handy at some point!

Next Time

As I mentioned, I solved the quality problem by improving the quality of the seed fed to the LCG. Next time we’ll look at the implementation details.

Fast-Forwarding LCGs

Braid Code Cleanup (part 1)