The Inner Product, May 2002
Jonathan Blow (jon@number-none.com)
Last updated 12 January 2004

Packing Integers

Related articles:

Scalar Quantization (June 2002)
Transmitting Vectors (July 2002)
Arithmetic Coding, Part 1 (August 2003)
Arithmetic Coding, Part 2 (September 2003)

Adaptive Compression Across Unreliable Networks (October 2003)
Piecewise Linear Data Models (November 2003)

This month we're going to start a series of articles about packing information from a 3D game into small spaces.  There are many different reasons why you'd want to do this.  Maybe you're writing a console game, and you want to fit your save-game data into as small a space as possible, for storage on a memory card.  Or maybe you're writing a networked game, and you want it to perform well over low-bandwidth connections -- or, if you're making a massively multiplayer game, maybe you want to save your company \$100k per month in server bandwidth costs.

In the March 2002 issue of Game Developer, in his article "Distributing Object State for Networked Games Using Object Views", Rick Lambright provides a high-level overview of game networking.  Now we will look at techniques that fill in the low-level side of that picture, discussing what you actually want to send over the wire.

We'll start simply this month by talking about integers.  Later, we'll do real numbers, 3D vectors, constrained 3D vectors (like unit vectors), and of course rotations.  Representing rotations in a size-efficient way is an interesting problem that turns out to be pretty deep.

Compression

Eventually we'll be looking at probability-modeling compression schemes like Huffman and Arithmetic coding, but we'll start with simpler methods.  General-purpose compressors have surprisingly limited use when we try to apply them to games.  For example, the best Huffman coders use adaptive encoding techniques, which means the full history of data to be transmitted is used to generate statistics that help make the data small.  To uncompress this data, you need to receive the entire stream in order, so that you can retrace the compressor's steps.  But if you're writing a networked game, you want to use an unreliable delivery protocol like UDP so that gameplay doesn't stall due to line noise or other packet loss -- you need to process incoming network messages with minimal dependencies between them.  But adaptive encoding creates a dependency for each message upon each previous message, which requires relaible, sequential delivery by a system like TCP -- and thus causes poor game performance.

The other thing about compressors is that they're not guaranteed to make your data smaller; in fact, they may make it bigger.  When you look at the results over all possible input values, compressors don't compress at all; their output on average is as large, or larger, than their input.  So if you absolutely must save your game state into a 64k block on a memory card, regardless of the positions of entities in the world, a compressor is not going to help you.

Suppose we're a boring game developer, and we're going to make another Quake-style first person shooter.  We only have 5 kinds of entities in our universe, for which we have declared some enumerated constants: PLAYER = 0, ROCKET = 1, AMMO_PACK = 2, HEALTH_PACK = 3, CRATE = 4.  We want to save the types of a bunch of these entities into a file.

We could be extravagant and write a full CPU register-sized value into the file for each entity type.  These days, a CPU register is a 32-bit integer, so we are obviously wasting a whole lot of space.  Clearly, values from 0 to 4 will all fit into a byte.  So we can write one byte per entity.  But that still wastes a fair amount of space, as you can see from Table 1: we're leaving 5 bits per value completely unused.

Table 1:

 Entity Type Value (base 10) Encoding (8 bits) PLAYER 0 00000000 ROCKET 1 00000001 AMMO_PACK 2 00000010 HEALTH_PACK 3 00000011 CRATE 4 00000100

Packing Bits

Clearly, we can reduce our storage size by using only 3 bits per entity.  The cleanest way to do this is to create some helper code that takes as arguments a value, and how many bits that value occupies; then it packs that value into a string of bytes.  We pack in a 3-bit number for each entity in the world, and when we're done, we ask it how many bytes of data it has collected (rounding up to the next byte boundary).  Then we write those bytes out to disk.

Listing 1 shows some code that does this.  It writes the input number into the destination buffer, one bit at a time.  This listing is written for clarity and simplicity, not speed.  A faster version, which operates on groups of bits simultaneously, is available in this month's source code on the Game Developer web site.  Also, for brevity, some implementation concerns have been omitted, like range-checking our position in the output buffer.

To successfully decode this buffer and read the values back in, we need to know how many values are stored in the buffer and how big they are, because we didn't store that information.  Usually, in a real game situation, we will have some set number of fields for an entity for which we know the sizes of the data (its type, position, orientation, etc).  In cases where not everything is predetermined, we will pack some additional data at the beginning of the buffer that tells us what we need to decode correctly.

To actually implement reading back the values, we want a routine that grabs bits out of a buffer.  We won't show the listing here, because the idea is very similar to our bit packer; it's included in this month's sample code, along with some tests that pack and unpack sequences of data.

At this point, just by using the Bit_Packer, you have a 90% solution for storing integers, in terms of fitting them within a small guaranteed space.  Many people stop here.  But if we really care about making things small, we can do better.

Packing Values With Sub-Bit Precision

Looking back at Table 1, it ought to be obvious that we don't even need 3 whole bits to store an entity type.  We only use the 3rd bit once; if we'd had one less type of item, we could fit the values into 2 bits each (though that would mean making a shooter without crates, which would be absurd).  As a corollary, while using 3 bits, we could have up to 8 different item types.  But since our imagination limits us to 5 types, for now we are just wasting almost a bit per type that we pack.

The Bit_Packer is a very programmer-oriented solution to the value packing problem.  In the past, we as game developers have had to do a lot of low-level stuff to make our games perform well.  We've taught ourselves to think about numbers in terms of bits and bytes, and value-packing looks like a problem that is well-served by that kind of thought.  But in fact, restricting ourselves to bit boundaries is problematic -- it's causing us to waste that extra space.

We can get that space back if we stop thinking about bits, and instead consider what is necessary to encode and decode an integer value.  Suppose we want to pack two entity types together.  Picture a 5x5 grid of squares, where the x-coordinate of each square in the grid indicates the type of the first entity, and the y-coordinate indicates the type of the second entity.  Every combination of entity types maps to some square in this grid; since there are 5x5=25 different squares, we can store both entity types in 5 bits with room to spare.  Whereas before, at 3 bits per entity type, we would have required 6 bits.

Any programmer who's ever addressed a screen of pixels knows how to generate a single integer index for this grid: it's y*5+x, where x and y range from 0 to 4.  To decode this integer, which we'll call n, we just reverse this process: x = n%5; y = n/5.

This process scales to any number of dimensions, and for any range of integer values in each dimension (as a quick exercise, visualize the 3-dimensional case).  Listing 2 shows code that packs an arbitrary number of values into a 32-bit integer.  In my own code, I use the Multiplication_Packer to cram as many values together as I can without exceeding the packer's 32-bit limit; then I pass the results to the Bit_Packer, which handles packing it into a long buffer of bytes.  This still induces a little bit of waste, but I figure that if I can reduce my memory wastage to a fraction of a bit in every 30, then I am doing just fine.  Extra sweating does not seem too worthwhile to me at that point.  But if you feel that you must absolutely eliminate all possible memory waste, you can write a version of the Multiplication_Packer that automatically stores its results into a byte array when it gets full enough.

The Multiplication_Packer is, at its heart, an arithmetic coder without statistical modeling.  So understanding this small set of functions may be a good introduction to more sophisticated compression.

By way of quick comparison: packing together 10 of our entity types using the Bit_Packer would have required 30 bits; using the Multiplication_Packer, we need only 24 bits, a savings of 20%.  (People who use a byte per value will require a whopping 80 bits; we are totally kicking those peoples' asses).

Something to Notice

It's important to see that the Byte_Packer and the Multiplication_Packer are essentially doing the same thing to stick values together.  The Byte_Packer uses shifting and a bitwise OR; these are special cases of multiplication and addition for when the range of input values is a power of 2.  The Multiplication_Packer is simpler, more general, and in some sense more pure.  But as game programmers we tend to think of numbers in a computer as being stored in binary, so we tend to think of the Bit_Packer first.  It's important not to be trapped in the binary-number mode of thought.  Sure, the CPU gives us operations that twiddle individual bits very quickly, and we know that internally, that's how it likes to store things.  But so what.  That thing held in the CPU register is a *number*, not a string of bits.

Byte Order

Programmers are used to confronting byte-order (endianness) issues when considering different platforms.  But this month's sample code has no such issue; it will work correctly regardless of the endianness of the underlying CPU.  This is because we are using mathematical operations to carve chunks off of numbers and store them as units no larger than a byte each.  That array of bytes can be written by an Athlon, and read back by a Sparcstation, and everything will work fine.

I have in the past seen bit-packing implementations that did not work this nicely.  They would treat an input value (32 bits maximum) as a string of bits in memory, and copy that string of bits into the destination buffer.  But because the format of bits in memory is not invariant, they have extra work to do when the time comes to port their code (and that extra work can get pretty messy).

Listing 1:

typedef unsigned long u32;

typedef unsigned char byte;

struct Bit_Packer {

int next_bit_to_write;     // Initialized to 0

byte buffer[BUFFER_SIZE];  // All initialized to 0

...

};

void Bit_Packer::pack(int num_bits_to_write, u32 value) {

while (num_bits_to_write > 0) {

u32 byte_index = (next_bit_to_write / 8);

u32 bit_index =  (next_bit_to_write % 8);

u32 src_mask = (1 << (num_bits_to_write - 1));

byte dest_mask = (1 << (7 - bit_index));

next_bit_to_write++;

num_bits_to_write--;

}

}

Listing 2:

struct Multiplication_Packer {

u32 accumulator;    // Initialized to 0

void pack(u32 limit, u32 value);

u32 unpack(u32 limit);

};

void Multiplication_Packer::pack(u32 limit, u32 value) {

accumulator = (limit * accumulator) + value;

}

u32 Multiplication_Packer::unpack(u32 limit) {

u32 quotient = accumulator / limit;

u32 remainder = accumulator % limit;

accumulator = quotient;

return remainder;

}