The Inner Product, October 2002
Jonathan Blow (email@example.com)
Last updated 17 January 2004
Toward Better Scripting, Part 1
My Friend, the Covariance Body
Toward Better Scripting, Part 2 (November 2002)
Every year we need to create larger amounts of content for our game worlds, and each piece of content becomes richer. As a result of this trend, scripting languages are becoming increasingly important. Hopefully, a scripting language enables the content builder to make things happen easily by expressing ideas in a high-level manner. And ideally, a faulty script won't crash the game and disrupt workflow.
With subjects like graphics or AI, we often turn to academic research for solutions. This is a good idea for programming languages too, but it doesn't work as well. It seems that most programming language research doesn't have much to say about making things happen in a game world. Much of what we want is highly domain-specific, so it seems we should apply our domain experience to come up with ideas for solutions that other people wouldn't think of.
Improving the feature set of a scripting language is a difficult thing to think about; we don't have many good examples, since most familiar languages are made for a different kind of programming than we want. So in this article and the next, I'm going to concoct an experimental programming language feature. My goal is to show an example of something that could be useful for games that we would be unlikely to see in a general-purpose programming language. I also want to play a little bit with the feature, and show the interaction between domain-specific concerns and language design.
I like 3D games, which are about things that happen over periods of time inside some region of space. Traditional programming languages are poor at dealing with time and space, so they should be good subjects to explore for domain-specific improvements.
We often want to talk about the behavior of things that change over time. We want to ask questions like "Has the player been closely following the messenger he's supposed to spy on, for long enough to complete the mission?"; "Have there been enough combat events lately to keep the player interested, or should we increase the number of wandering monsters?"; "Is the player making continual progress toward the goal, or should we guide him more?" Programming these kinds of things is usually a lot of work. We need to make a piece of code that runs every frame and measures some things about the state of the world, then maybe records those values somewhere so that we can look at them in the future, or averages them together into some quantity that we then make decisions about. These systems often grow to be unwieldy, and they easily pollute other portions of the code base (we'll see some examples of contamination next month).
In most programming languages, when we assign a variable to a new value, the old value is completely erased. But if the language possessed some kind of short-term memory about how the variable has been behaving, we could use that facility to answer a lot of time-based questions. And maybe in the end, our scripting language will be a little more organic, a little less mechanical.
I will implement this memory by building it into the variable handler for a scripting system. Since it's in the core of the interpreter, we can invest effort to make the feature high-quality (e.g. mathematically well defined and frame-rate independent), and the benefits will propagate through the system: everyone who performs time-related queries will get high quality results.
I am choosing this feature because it is kind of cool, it's something I've never seen before, and it's a facility you could imagine existing in almost any language. I'm not choosing it as a "magic bullet" to solve all our scripting problems; time-related queries are not necessarily the biggest content development bottleneck. But by exploring features like this, step by step, we can eventually arrive at a programming paradigm very different from what we have now, a paradigm that is much more productive for us.
One of the main tasks of a scripting system is to manage communication between the core game engine (fast, low-level code) and the running scripts (slow, high-level code).
So that the scripting system can perform code execution and runtime type-checking, each variable accessible by the scripting language is stored in a table, and the entry for that variable is a data record containing more than just the variable's value: for example, a string that tells you its name, and an integer telling you its type (boolean, real number, integer, entity ID, etc).
I will store the history information inside this data record. History will automatically be available at all times for every variable in the system, without requiring the programmer to explicitly ask for variables to be tracked. Behind the scenes, the engine must be updating the history of all variables defined in the system, so the history update needs to be a fast operation. The majority of these computations will produce values that are never referenced, and many optimization-obsessed programmers will grimace at that notion. But I want the history information to be easy to use and maximally dependable. The programmer should expend exactly zero extra effort to get at it. I don't want the unreliable behavior that would happen otherwise. Imagine that a script is dynamically loaded, and that script suddenly wants to know how a certain variable has been behaving for the past 30 seconds; but the information isn't available because analysis of the previously loaded scripts told us that nobody would care about that variable's history. This is a specific case of a pattern I call optimization impeding progress. We end up spending huge amounts of time debugging, and not enough time actually making the game.
I don't want come across as an anti-optimization bigot; optimization is very important in creating smoothly-running systems. But we always need to weigh the benefit of an optimization against the damage it causes; we need to be cognizant of the house-of-cards complexity that we tend to build up after many optimizations. The talent in being a good modern software engineer is in choosing the few best optimizations, not in optimizing everything. To optimize our optimization efforts, as it were.
It's my expectation that the amount of CPU we spend tracking history will be vastly overshadowed by time spent running actual script code; optimization in this case would be unproductive.
We could devise a system where, every frame, we sample the current value of some variable and store it in an array. Then scripting code could look over this array and make decisions every frame. But storing a large number of samples for every variable is costly, and it's unclear how we should process that sample array consistently and meaningfully (for example, without variable frame rate messing us up). We might be led toward methods involving signal reconstruction from nonuniform samples, a slow and painful activity.
So instead of recording many exact values, I want to record a small number of approximate values that tell us about the trend of a variable over time. As a result, the system can answer questions about the general behavior of a value, but can't say what the value precisely was at a given time.
Some time ago, a co-worker showed me a simple hack for smoothing rapidly-changing values. If x is some fluctuating input value, and the ith sample of x is xi, then yi = kyi-1 + (1-k) xi, where 0 ≤ k < 1 is a value that changes more smoothly. Another way to interpret "more smoothly" is "takes some time to converge to the input value", where the amount of time is controlled by k. (This equation is a simple low-pass IIR filter). This construct originally came up when we were drawing frame counters; instead of displaying the instantaneous frame rate, which is difficult to read since it changes so quickly, we would smooth the frame rate by two different values of k, to show short-term and medium-term trends.
This filter is much like what I want, except that the rate of convergence here depends on the number of iterations you run; so if you perform this operation once per frame, the rate of convergence depends heavily on the frame rate. But you can derive a less-hacky, frame-rate-independent version of this. Make k a function of duration, then write an expression that says the results of filtering serially, first with k(Δt1) and then with k(Δt2), should be equal to filtering once with k(Δt1 + Δt2). Work some math and you get k(Δt) = (ba)Δt. Both b and a are parameters you can adjust, but for our purposes they comprise only one degree of freedom, as we'll see in a moment.
For each variable, I set up an array of these functions, where each function converges at a different speed. I'll control these by designating h = the amount of time it takes for the filter to converge to 90% of a fixed input value. Convergence is asymptotic, so the filter will never actually reach the input value. The values of h might range from, say, 0.1 seconds to 30 minutes.
The appropriate k is k(h, Δt) = (.11/h)Δt, where Δt is the frame time. The division and exponentiation look scary, but actually our system can be very fast. The value of k depends only on h and Δt; h is fixed permanently, and the frame time changes only once per frame. So when it's time to update the scripting system, we precompute the handful of values of k that we will be using. Then we iterate over all the variables, updating their history data using only addition and multiplication.
Once this mechanism is in place, we can ask questions about a scalar value, like "has it been generally increasing or decreasing over the last n seconds?", just by looking at the history data. You can imagine abstracted language primitives like Trend(variable, time_period) that might encapsulate this. We can also ask roughly "what has the average value of the variable been over the last t seconds?" (though it's actually unclear what a perfectly-well-defined notion of average would be; you get into signal windowing issues. But our version is pretty good.)
To answer questions about a variable for arbitrary times, we need to interpolate between the history values. For now we'll just use linear interpolation; maybe next month we will discuss alternatives.
We can straightforwardly extend all this filtering to vector values by performing the scalar history tracking on each coordinate of a vector. So we can also talk about the history of where entities have been in space.
It would be useful to know more than the average of some particular value. We might want to ask "Is this value steady, or is it fluctuating? Is it becoming more stable over time, or less stable?" We can answer these questions if we track the variance of our variables. We blend variances together over time, the same way we are computing means.
Variance of a scalar variable is another scalar, which you compute by averaging the squares of the input values. The variance of an n-vector is an n×n matrix, which you compute by averaging the "squares" of the vectors, which you get by taking the outer product of each vector with itself. (I have put "squares" in quotes here because there are several valid ways of multiplying vectors, one alternative being the Clifford product). For more detail about this outer product matrix, see last month's column, "My Friend, the Covariance Body". I called this a "variance-covariance matrix", among other things, which is what you should search for on the web for more information. In general, though, I prefer to think of this matrix as just being the variance of a vector variable.
The matrix is symmetric, so we only need to store the upper triangle, which is nice. Eigen-decomposition of the matrix yields an ellipse that tells us generally where in space a point has been. So we can use this to ask questions like "Has the player been exploring a wide area, or has he been sedentary? What's been the primary direction of his travel and has he been mostly going straight, or has he zig-zagged a lot?" You might think to answer these questions just by analyzing a log of averages of the player's position, but you run into aliasing problems that way. The variance carries information of a rich character that can be effectively leveraged using a small number of samples, as you can see by running the sample code.
Last month I presented the covariance body as an encapsulated concept, but sometimes it's better to violate this abstraction. Recall that, to combine two variance matrices, we need to shift them so that the variances are computed around a common point. Often this common point is the mean value of the inputs. But to make the system as simple and fast as possible, I have chosen to compute the variances of all values with respect to the origin. So we can just add the matrices together without shifting them. When the time comes to display a variance, then we compute the shifted matrix and perform the eigen-decomposition on that.
This month's code defines two variables that get updated in real time: one scalar representing the clock, and one 2D vector representing the position of the mouse pointer in the window. Once per frame, a history manager computes new averages and variances for these variables. Each frame the histories are drawn on the screen: a linear trail of white showing the pointer's average positions, and a series of blue ellipses showing the typical areas covered. Pressing 'S' will toggle an enforced slow-down of the frame rate; you can see that the behavior of the averaging stays consistent despite the frame rate changes.
Aside from the variable declaration system, there's no actual scripting language yet. We'll fix that next month, when we look at some operations we can support on history-augmented variables. We'll also examine some concrete applications of the history feature. There's no room to discuss the examples fully in this article, but in order to end with something tangible, I'll summarize them now.
The first example is a commentator for a game like Dance Dance Revolution. The game assigns you a momentary rating based on how well you hit each arrow; this rating fluctuates very quickly. But because the rating is automatically being tracked over time, the script can easily check for broad patterns in the history. So if you do poorly for a while, but then start doing very well, the commentator might say, "What a comeback! For a while, I thought you were going to lose!"
The second example is a mortar unit from a Real-Time Strategy game. It fires ballistic shells that take some time to reach the target; in order to hit, it should only fire when the target is staying within a relatively small area (though the target doesn't have to be unmoving! Imagine a soldier alternating between two positions in a trench.) Since every unit in the game has a position vector, and that vector is being remembered over time, the mortar vehicle can check how spread out the target's position has been over the past n seconds, and fire if the result is tight enough.
Next month we'll look at both these cases, see how they might be done traditionally, and then implement them using history information.