Interactive Profiling, Part 2 (from The Inner Product)

The Inner Product, January 2003
Jonathan Blow (jon@number-none.com)
Last updated 17 January 2004

Interactive Profiling, Part 2:
Behavior Analysis

Related articles:

Interactive Profiling, Part 1 (December 2002)
Interactive Profiling, Part 3 (February 2003)

Last month I talked about the sorts of low-production-value profilers we often need to build into our games. I described these as "interactive profilers", distinguishing them from the available commercial products, which I called "batch profilers".

I described several ways that an interactive profiler can be helpful when a batch profiler cannot. A big problem with batch profilers is that they average measurements across the profiling session. So if the program changes its overall behavior several times during the session, with a batch profiler you see some kind of average behavior which may not be much like the specific behaviors you want to debug.

To illustrate this, Table 1 shows an example of three different behaviors a program might exhibit, and the resulting analysis you'd get from a program like VTune. During any given frame, the readout of an interactive profiler will clearly show one of these three behaviors, with a single dominant function eating most of the CPU time. The result given by the batch profiler shows no such dominance.

Table 1:

Code Module	Behavior A	Behavior B	Behavior C	Average
rendering	75%	25%	25%	42%
physics_sim	15%	65%	15%	32%
ai_pathfind	10%	10%	60%	27%

Now suppose that behavior C is actually an error, and it should not be happening at all. Given a clear picture of these numbers, we see that there is a dominant function in behavior C and we have a hope of understanding why that function is called. The result of the batch profiler shows three functions of roughly equal importance, and it's unclear how to proceed in optimizing the system. Of course, actual modern games are even less clear than this, since they involve longer lists of profiling numbers (much longer!) and a more varied gamut of behavior.

So like I said last month, you can use an interactive profiler to help clarify problems like this. But historically, interactive profilers have provided minimal functionality -- a list of program regions and CPU times, and not much else. The numbers they present to us are ephemeral: either they exist for a frame and then disappear, or they're averaged over several frames, hiding spikes that are important to detect. This is an interesting conundrum in interactive profiling: we want to add hysteresis to the reported data, so that it is easier to read (as with the IIR filters last month); but that hysteresis can hide important events. It makes the profiler less interactive, and more batch-oriented, but without the stored-data advantages of batch profiling.

So with the common sort of interactive profiler, we lose one of the strengths of batch profiling: the capability for precise and unhurried analysis of the available numbers. We can see changing behaviors there in the live profiling data, but they're difficult to quantify. How much time is being spent in behavior A, and how much in behavior B? How frequently is some particular performance spike occurring, and how long does the spike last?

In a sense, we want to quantify the behavior of the profiling numbers at a meta level. The raw list of numbers for any particular frame tells us how long the program spent drawing the world, running AI routines, performing collision detection, and such. As we saw with Table 1, those numbers can change significantly during a run. So we really want to step up a level, and not limit ourselves to speaking only about elements of the raw timing data. We want to look at the array of data from any given frame and treat it as a coherent set of numbers, as an abstract object in its own right. That coherent set of numbers is a way to quantify what I have been calling a "behavior". We want to talk about how consistently the game produces that behavior; and when it produces some other behavior, what precisely that means. We want solid quantitative information about when and how often the program exhibits each behavior. Thus we will have elevated ourselves to a level where we are taking concrete measurements about formerly abstract ideas like "the fill-rate-limited behavior" and "the physics-heavy behavior", and no longer trying to make inferences by looking only at timing data for individual program zones.

If this elevated data is presented in an easy-to-perceive way, it will become an important tool for understanding just what our program does. (Justification of the importance of this is coming up in the next section!)

This month I want to explore the kinds of analysis capabilities we might want in an interactive profiler; next month I will follow-up with some concrete implementation. Our goal will be to introduce as much analysis as possible without slowing down the running game. (Maintaining a high frame rate during profiling is important for reasons discussed last month).

What do we want?

To design appropriate analysis tools, we first need to clearly define what we want the tools to do for us. This may take some thinking; as games have evolved, our profiling needs have changed also.

One important observation is that instruction-level profiling is no longer as important as it once was. One reason for this is that modern CPUs are unpredictable; it's harder than ever to make a program faster by applying minor code tweaks. But there's also a deeper reason.

It's a pieceof ancient programmer wisdom that we should not resort to peephole optimization (what instruction-level profiling is good for) until the options for algorithmic optimization have beencarefully explored; high-level changes to an algorithm can functionally invalidate low-level optimizations, while having an overall impact on the execution time that dwarfs the gain of those low-level tweaks.

As time goes on, algorithms used in games are getting more complicated; thus, the number of degrees of freedom for possible algorithmic changes is constantly increasing. By that I mean, we have more choices of different ways to alter an algorithm while still accomplishing the same goal. With so many degrees of freedom, and with limited manpower to spend on optimization, we can easily exhaust our work budget while exploring algorithmic changes, never reaching the low-level optimization stage for most of our game's subsystems. So we want a framework that encourages and supports profiling at an algorithmic level, not an instruction level.

Also, traditional profilers assume that a program's behavior is somewhat orderly, and that a static listing of CPU times is sufficient to understand its behavior. But now games are so complicated that we often don't have a good idea of what they're doing internally, even when they seem to work. One example: it's easy to accidentally draw all the entities in the world twice per frame. This might happen for simple reasons, like big blocks of code got moved around but a function call got duplicated. In this case the entity rendering routines get called twice, but it's hard to see because the calls are in different files, a few function calls' depth apart from each other; if multipass rendering is being used, there will already be multiple calls to some entity drawin routines that you really do want to be there, further confusing the situation. A more sophisticated way you might duplicately render entities is if you collect them by portal navigation through the world, but the ones you have already collected don't get correctly marked, so odd combinations of portals cause entities to be added to the render list multiple times.

Another example: it's easy to perform collision detection against an excessive number of not-so-nearby objects. Either bounding volumes for the objects are unintentionally large, or the data structure they're being stored in is overly loose. In both this and the duplicate-drawing case, the rendered scene appears to be correct, but the game has taken a lot of extra time to generate it.

So I will say that one of the main purposes of a modern profiler should be to act as a tool to understand the behavior of a program; to verify that the program is running correctly or to understand how it is wrong. Even when performance problems are not due to bugs, they often exist because we don't quite understand the performance ramifications of our algorithms when given complex input data. So again, the chief job of the profiler is to help provide understanding.

Effective utilization of such profiling data requires an approach that is a little less direct than what we had in the past. With instruction-level optimization, the profiler can point you at exactly the piece of code you should be concerned with. But when optimizing algorithmically, the code where the time is being spent is not usually what you care about; you want to know _why_ time is being spent there, so that you can change the program's overall behavior, to call that code less or eliminate it entirely.

An example of this is the rendering of stencil shadow volumes. Suppose the profiler tells us we're spending a lot of time computing the silhouettes of the objects in our game world, so that the shadow planes can be extruded from them. Once we are convinced that the silhouette computation is not grossly incompetent, we look for algorithmic solutions to the problem, which will likely yield much greater speed-ups than trying to optimize the silhouette computation itself. One such speed-up is to more agressively cull light sources that only create shadow volumes that cannot be seen. With fewer light-to-object interactions, we call the silhouette routine less often, and the game goes faster.

While understanding the behavior of our program is a very important goal, right now we have no tools for that purpose. That seems like a bad thing.

How we might use such a profiler

So what specifically do we want the tools to do for us when we sit down in front of the computer to profile?

Firstly, I'd like to be able to casually run the game, without needing to prepare a special build, and without a large framerate hit, and get profiling data -- this allows me to just walk around in the game world and do things that test performance characteristics.

I'd like the profiler to detect the various ways that the program behaves, and to provide for me an abstract list of those behaviors with quantitative results attached. After I run around the game world doing a bunch of stuff, I want the profiler to give me something like Table 1, and say "you were in behavior A 50% of the time, B 35% of the time, and C 15% of the time." Each of A, B and C consists of a separate set of profiling data, like a batch profiler would give us for an entire run. Now we are viewing things from a higher level, and would like to know how important each behavior is, and how they interact. It would be nice to see a graph of which behavior was happening when, so I can correlate the changes with things that happened in-game.

Once a well-defined set of behaviors has been measured, I'd like the facility to mark some of them as expected/acceptable behaviors, and some of them as anomalous/unacceptable behaviors. The profiler can provide statistics about how much time has been spent in unacceptable behaviors. Suppose the game has a journaling system; I am running a recorded session several times to benchmark performance as I attempt several optimizations, to see how effective those optimizations are. I can watch the percentage of time being spent in unacceptable behaviors, and the CPU measurements for those behaviors in isolation; this may provide much more concrete and useful information than a batch-profile-style average.

Again if the game has a journaling system, I'd like to mark certain behaviors as things I want to debug. Then I can start playing back the journaled game events, and when the game reaches one of the target behaviors, the playback pauses. This allows me to quickly access the problematic game states, and then look at the game world to infer what may be causing the problem. In the stencil shadow example, perhaps during a behavior when the silhouette-finding takes unreasonable amounts of the CPU, there don't appear to be an unusual number of light sources or objects in the scene. I then turn on wireframe rendering, and see that far in the distance, in areas that are occluded from view, are many light sources and objects. I can then deduce that there is an error in the occlusion culling code. You could also find this kind of problem without a journaling system or analytical set of behaviors, by walking around the world and watching the live numbers. But the idea is that the extra analytical capability helps us find problems more quickly, and in a more reproducible way, and also helps detect problems that are subtler.

I'd like to have some querying ability about the behaviors. I want to be able to ask things like, "which behaviors contain a lot of silhouette computation?", then look at those behaviors as a group to understand them better. Perhaps I want to perform queries with boolean operations, like "which behaviors contain a lot of 'collision_detect' and NOT a lot of 'physics_simulate'?" This would indicate unusual situations where objects collided a lot without moving far between collisions, which is a behavior case we might want to spend a lot of effort to alleviate.

If I am investigating a certain behavior, I may wish to ask "which behaviors are like this one?", to find other cases that may shed light on the current situation; or I may wish to ask "which behaviors are very much unlike this one?", to see how far the game's behavior can deviate from what I am studying.

To discover performance patterns, I want the profiler to detect correlations between timing values. Suppose time spent in the dynamic texture generator is roughly proportional to time spent drawing particle systems, and both of those correlate significantly with increased frame time. That would indicate that the art guys are using too many procedurally generated textures in the particle systems, and they need to cut down on those; but perhaps the other dynamically generated textures do not represent a performance problem (say, the ones used for rippling water).

Since the profiler is built into the game, we can expose game variables to the profiler. So instead of just classifying behaviors based on timing data, it can also look at "the number of monsters currently running around" or "the number of nodes in the potentially visible set". It can find correlations between these variables and the timing data, or perhaps just among the variables themselves. With that capability, we have a powerful tool tremendously different from anything we've had yet.

Next Month

This month's article has consisted basically of a big wish list. Next month, we will look very concretely at the implementation of an interactive profiler that attempts to satisfy these wishes.