Braid Particle System Revision (part 1)

The game Braid originally shipped to the world in 2008, and after some ports in 2009, I have only worked significantly with the code on a few occasions. But I want to maintain this game indefinitely into the future; in the back of my mind, there have always been some clean-ups that I have wanted to perform on the code. Often when shipping a game, the best answer to a problem isn’t evident, and we are under time pressure, so we solve the problem in some way that is sufficient but sub-optimal. Other times, we need to design the game to meet technical constraints of systems we want to deploy on, but as the years go on, these systems become irrelevant, so the code can be cleaned up. I figured it would be interesting to talk about some of these things in a blog (and the blog will help motivate me to think about these situations and clean some of them up!)

So, let’s talk about particle systems and random number generators.

On the night of July 4th, instead of watching patriotic USA fireworks, I dug out the Braid code and replaced the random number generator. The old generator was the LCG used in BCPL. “LCG” stands for “Linear Congruential Generator”, which is a simple and efficient method of generating pseudorandom numbers. BCPL is a programming language from the 1960s (https://en.wikipedia.org/wiki/BCPL) that provided in its standard library an LCG with a particular set of constants. When starting the development of Braid, I was looking for a good random number generator, and a friend recommended this one to me, but it was sort of a guess on both our parts. You can find a list of LCGs and the constants used in this Wikipedia article, and a more comprehensive list here. I will refer to this particular LCG as “BCPL” even though BCPL is really the name of the programming language.

When using the BCPL random number generator to generate particles, I had needed to add some ugly workarounds to get good-looking results. These results are not necessarily the BCPL’s fault; it may just have to do with the way I was using it. So perhaps the proper initial step would be to figure out the way in which I was using BCPL incorrectly. But I was also motivated to experiment with PCG, a newly-invented random number generator that sounds really good. So I chose that as my first approach.

Like many games, Braid has particle sources, where the properties of each particle (position, velocity, size, rate of change of size, initial color, rate of change of color, etc) are produced by a pseudorandom number generator. Most games just generate these properties once when a particle is created, then simulate the particle forward in time once per frame, which is computationally inexpensive. But in Braid, one of the player’s primary actions is time manipulation, causing time to go forward and backward. The game needs particles to appear consistent as the notion of “current time” changes; if time goes forward so that a bunch of particles die, but then goes backward again, you want to see those very same particles fade back in and live their lives in reverse. You could do this by using a classical particle system and storing the state of every particle, and restoring that state when time goes backward, but that would use a lot of memory very quickly. So the problem statement for Braid was, given the settings for a particular particle source, to be able to generate the state of the particles spawned by that source at any given time t, in a way that is relatively efficient with both computation and storage.

I solve this problem by re-seeding the random number generator for every particle every frame – essentially undergoing the creation process for every particle every time it is rendered, and then simulating that particle to the current time. As long as the particle source knows exactly which particles will be active at a given time (which requires a constant emission rate, a fixed maximum lifetime, and other such constraints), then we don’t need to store the state of any of the particles. As long as the logic to evolve a particle through time is a simple closed-form function or other small piece of relatively straightforward code, that’s not too expensive computationally either (much cheaper than simulating forward N fixed timesteps from the starting state of the particle, where N can easily be in the hundreds or thousands).

In most games, particles work something like this:

to spawn N particles:
    for 1..N {
        generate position, size, color, rotation, etc of this particle,
            using a global random number generator that we don't think much about.
        add particle to simulation buffer
    }

at render time:
    N = dt * emission_rate
    spawn N new particles
    
    for each particle in each source {
        if particle has exceeded its lifetime, kill it, continue.
        
        simulate particle values incrementally by dt
            (position = old_position + velocity * dt, etc).
            
        store renderable values into the output buffer.
    }

In Braid, the code for rendering particles is something like this:

for each particle source in the current level {
    figure out index_first and index_last, the bounds on particle indices
        emitted by this source at this particular time (for example,

        index_last = floor(current_time * emission_rate);
        index_first = index_last - max_particles;
        
        (particle lifetimes are constant, so max_particles is constant; if
        you want a particle to die early, it just fades out completely, but
        still exists).

    for index_first..index_last {
        seed a per-source random number generator by this particle's index
            (multiplied by a large number or two, to make the seed seem more random).
        generate position, size, color, rotation, etc of this particle
            as they would be at spawn time.
        simulate the aforementioned values forward to current time,
            preferably using closed-form equations so it's cheap (if everything
            is closed-form, then this is just like the update step for a
            normal game, but using (current_time - spawn_time) for the dt.)
        store renderable values into the output buffer.
    }
}

The reason you have to seed for each particle, rather than only once per source, is that the range of starting and ending particle indices will be changing every frame, so there isn’t any sensible starting point in time for your seed to designate. But by seeding for each particle, you suddenly don’t care about time any more.

To my surprise, when I first implemented this, particles often looked bad. Correlations were visible between different particle attributes: for example, the X and Y components of each particle’s velocity were correlated, which biased particles to travel in some directions but not others; particle size was correlated in weird ways with speed, and so on.

It did not surprise me that seeding for every particle made the numbers less random, but I had not expected the decreased randomness to be so visually objectionable. Of course I tried several of the obvious solutions, like only using the upper bits of the LCG’s output, but they didn’t seem to matter much.

Braid particles with undesired correlations. (See the patterns in the triangular particles that fade in as the character visits each podium.)

Braid particles after fixes have been applied.

I did not know a lot about the practical usage of random number generators (I still don’t!), but I needed to fix this problem, so I’ll now explain the solution I arrived at.

It seemed clear that seeding for each particle was causing the problem. This would be very unsurprising if I were just using each particle’s index as the seed (since these indices are small integers in sequence), but I had been multiplying the index by some large number in an attempt to make it seem more random, and to give it many more bits for the LCG to work with. Apparently this had not been enough. So, what if I generated some table of higher-quality random numbers at startup (using a scheme that would be too slow to use when rendering particles), and then I indexed that table per-particle to retrieve a good random number that I would use to seed the LCG for each particle? As mentioned, I didn’t know too much about random number generation, but I knew the Mersenne Twister was supposed to be “high-quality but slow”, so I used that. The code is shown below. (Note: I am not recommending that this is a good solution; it is just how I got things working at that time. It is interesting, though, in that it solved the problem in a shipping game that’s been played by a million or two people.)

Here’s how I set up the table of higher-quality random numbers:

const int NUM_NUMBERS = 127;
globals.particle_randomization_table_size = NUM_NUMBERS;
globals.particle_randomization_table = new unsigned long[NUM_NUMBERS];

Random_Generator_Mersenne mersenne;

// (This is me writing this comment in 2016):
// The seed number below is a traditional way of initializing the Mersenne Twister, 
// but M.E. O'Neill makes the case that you can't well-seed this RNG with only 32 bits:
//           [http://www.pcg-random.org/posts/cpp-seeding-surprises.html](http://www.pcg-random.org/posts/cpp-seeding-surprises.html)
// I chose a fixed seed so that successive runs of the game are identical.
// Thus I could have just replaced this init code with an array literal containing
// pre-generated output numbers, but I think I figured the impact on startup time was small
// and that I preferred the flexibility of keeping this code live so I could change it at will.

mersenne.seed(19650218UL);

for (int i = 0; i < NUM_NUMBERS; i++) {
    globals.particle_randomization_table[i] = mersenne.get();
}

When computing the particles during each frame, the table was used in this way:

    int randomization_cursor = (first_particle_index + num_particles - 1) % globals.particle_randomization_table_size;
    if (randomization_cursor < 0) randomization_cursor += globals.particle_randomization_table_size;

    for (int i = num_particles-1; i >= 0; --i) {  // We iterate backwards because we draw new particles beneath older particles, as this just looks better; and most-obscured particles draw first, for Painter's Algorithm reasons.
        int particle_index = first_particle_index + i;

        // The next two lines are an optimization to avoid using a modulus operator per particle.
        // We really want to do: particle_index % globals.particle_randomization_table_size;
        // Note that particle_randomization_table_size is not a power of two, intentionally. (It's close though!)
        // I am presuming the 'if' gets compiled into a CMOV.
        
        int randomization_index = randomization_cursor;
        if (--randomization_cursor < 0) randomization_cursor = globals.particle_randomization_table_size-1;

        unsigned int number_a = globals.particle_randomization_table[randomization_index];

        // 'gen' is the random number generator.
        gen.seed(number_a * particle_index * big_number);

        // The rest of the particle generation logic goes here, for example:
        p.lifetime  = lerp(lifetime0, lifetime, gen.get_zero_to_one());
        p.size      = gen.get_within_range(size_scale_0, size_scale_1);
        p.dtheta_dt = gen.get_within_range(scaled_dtheta_dt_0, scaled_dtheta_dt_1);
        p.theta0    = lerp(theta0, theta1, gen.get_zero_to_one());
        ...
        // Including a lot more code that does the simulation part.
    }

You’ll note here the presence of a variable ‘big_number’ inside the seed argument. big_number is computed in the following way, once per particle system (not once per particle):

    gen.seed(source->portable_id * 13482034981);  // @Hack: This number is meaningless, I just banged on the keyboard.  It could be a really bad number.
    gen.get();
    gen.get();
    gen.get();
    gen.get();
    gen.get();
    gen.get();
    unsigned int big_number = gen.get();

For context, gen.get() looks like this:

inline u32 Random_Generator::get() {
    state = state * 2147001325 + 715136305; // BCPL generator
    // Shuffle non-random bits to the middle, and xor to decorrelate with seed.
    return 0x31415926 ^ ((state >> 16) + (state << 16));
}

Note my continuing commitment to quality there where I call gen.seed for the particle source (yes, that comment is from the original source code). I suppose I should have investigated whether that number is pathologically bad for BCPL.

So, per particle system, I am seeding once by the ID of the particle system (so this is constant per-particle system), then generating 6 ‘random’ numbers and throwing them away, then using the seventh. I throw 6 away because I was finding this necessary to keep the visual appearance of randomness, even using the Mersenne Twister starting table. After 6 numbers, things looked random enough.

I did not feel great about this. Sure, these extra generations were happening only once per particle system, rather than per particle, but in Braid a lot of particle systems contain relatively few particles, and speed was at a premium. (On the Xbox 360 I’d had to write a SIMD version of the particle generation code in order for the game to hit 60fps everywhere).

But when you’re shipping a game, you have so much to worry about that some things just fall off the table. The disciplined approach would have been to stop trying to mix a bunch of numbers together in a bowl and hope it magically works, and instead learn more about random number generators, and find a simpler and cleaner solution. But if I had done that, the game would have come out later (or not at all; I was having a hard time psychologically by the time the game was finally done, 3.5 years into the project, and many independent game development efforts collapse long before the game is done).

So instead, I mixed a bunch of numbers together in a bowl, and after trying a few things it worked well enough.

Of course, it wasn’t quite that simple, because there were 4 variants of the particle-generation code that were mostly copies of each other: (1) the slow path to fill out to seed the initial state of a particle system, (2) to iterate the state of those particles forward quickly every frame, (3) to implement the special case of particles along a line (for the sun only), and (4) to implement the particles for the Ring in world 6 (the calling site is actually ifdeffed out, but this code is still hanging around for some reason, presumably because I thought it might come back at some point). (3) and (4) were factored into their own implementations so that they would not slow down the particle evaluation in (2) with extra conditionals.

What is extra fun is that (1) and (2) need to compute exactly the same result, otherwise you’ll get discontinuities when particle systems wrap or when rewind is engaged. I had forgotten this when I started doing this particle system revision, and I spent about 20 minutes trying to figure out what was going on before I saw how things are structured.

This clumsy randomization hack has been sticking in the back of my mind as one of the things I am vaguely unhappy with in the Braid code. (It is not nearly the worst, because this one isn’t visible to the user; there’s other stuff, like the camera handling and occasional collision detection weirdness, that upset me more because you feel them when you play).

For some reason, this week I decided to fix it. As it turns out, simply by using PCG instead of BCPL, I was able to strip out the hacks, leaving only the “straightforward” system of re-seeding for each particle based on its index.

I was able to remove the Mersenne Twister completely from the codebase (two files containing 79 lines of code (not including blank lines and comments)). Then, after simplifying away all the extra code that generated and used that precomputed table, I saved another 62 lines of code, for a total of 141. Most of the code in the listings above has been removed.

So, at the end of the day, I removed a bunch of ugly hacks and ended up with shorter, more-elegant code. So this was a win, right?

Not clearly. One reason is that PCG is slower than BCPL, and there’s not really a good reason for us to be paying that cost. Again, here’s what BCPL number generation looks like:

inline u32 Random_Generator::get() {
    // state is 32 bits.
    state = state * 2147001325 + 715136305;
    return 0x31415926 ^ ((state >> 16) + (state << 16));
}

and here’s what PCG looks like:

inline u32 Random_Generator::get() {
    // state, inc are both 64 bits!

    u64 oldstate = state;
    state = oldstate * 6364136223846793005ULL + (inc|1);

    u32 xorshifted = ((oldstate >> 18u) ^ oldstate) >> 27u;
    u32 rot = oldstate >> 59u;
    return (xorshifted >> rot) | (xorshifted << ((-rot) & 31));
}

PCG is doing substantially more arithmetic, and it’s doing it on numbers that are twice as wide, and the arithmetic has some unhappy dependencies (for example, shifting by the variable number of bits ‘rot’, which is pretty bad on some architectures).

If this were some kind of cryptographic application, paying this cost might be warranted. But we’re just trying to generate some visuals; we don’t need to pass rigorous statistical tests!

But the real reason this isn’t a win is that I didn’t increase my knowledge in any deep way. I learned some surface stuff about random number generators by skimming the PCG paper, but really so far I have been lazy. I don’t even have the conceptual tools yet to reason about what the best solution for this problem would look like; so long as that is the case, I am doomed to just keep trying random things.

Fortunately, in the light reading I have done so far, I’ve learned enough to get a little bit of traction, and I see several potential directions in which to go to improve this situation.

It seems like we ought to be able to go back to BCPL and do something simple to increase the quality of our seeding. At the same time, though, from what I am learning about PCG, LCGs, and other types of random number generators, there are a few possibilities that should at least be discussed. We’ll continue this in part 2, in which I actually start learning real things about random number generators, and stop treating them like magical black boxes.

John Carmack on Inlined Code

Fast-Forwarding LCGs