John Carmack on Inlined Code

(Here begins a comment written by John Carmack in 2014, about an email he sent to a programmers’ mailing list in 2007.)

In the years since I wrote this, I have gotten much more bullish about pure functional programming, even in C/C++ where reasonable: (link)

The real enemy addressed by inlining is unexpected dependency and mutation of state, which functional programming solves more directly and completely. However, if you are going to make a lot of state changes, having them all happen inline does have advantages; you should be made constantly aware of the full horror of what you are doing. When it gets to be too much to take, figure out how to factor blocks out into pure functions (and don.t let them slide back into impurity!).

As an additional confirmation of the point of the article, a couple years later when I was working on the Doom 3 BFG edition release, the exactly predicted off-by-one-frame-of-latency input sampling happened, and very nearly shipped. That was a cold-sweat moment for me: after all of my harping about latency and responsiveness, I almost shipped a title with a completely unnecessary frame of latency.

To make things more complicated, the do always, then inhibit or ignore strategy, while a very good idea for high reliability systems, is less appropriate in power and thermal constrained environments like mobile.

John Carmack
September 26, 2014

From: John Carmack johnc@idsoftware.com
Date: Tue, Mar 13, 2007 at 4:17 PM
Subject: inlining code

This is going to be an unusual email – I want to talk about coding style. I’m not going to issue any mandates, but I want everyone to seriously consider some of these issues. I’m not talking about petty things like spaces around operators, bracing styles, or pointers by type or variable (although we probably should settle that one), but about larger organization of code. There aren’t any silver bullets, but months can be shaved off of a development project with a few percent improvement in productivity.

This email was getting too long, so I am going to follow up with some other thoughts later. I have a lot of general things to discuss, but there is a specific tactical direction that I want to advocate.

A year ago I was involved in a discussion about writing extremely reliable software for aerospace applications, as I have been several times over the years. Typically I am there to rail against the people that talk about using threads and an RTOS for such things, when a simple polled loop that looks like a primitive video game is much more clear and effective. However, this particular discussion brought up a new wrinkle for me:

Indeed, if memory serves (it’s been a while since I read about this)…

The fly-by-wire flight software for the Saab Gripen (a lightweight fighter) went a step further. It disallowed both subroutine calls and backward branches, except for the one at the bottom of the main loop. Control flow went forward only. Sometimes one piece of code had to leave a note for a later piece telling it what to do, but this worked out well for testing: all data was allocated statically, and monitoring those variables gave a clear picture of most everything the software was doing. The software did only the bare essentials, and of course, they were serious about thorough ground testing.

No bug has ever been found in the “released for flight” versions of that code.
                                                     Henry Spencer
                                                  henry@spsystems.net

Now, a great deal of stuff that goes on in the aerospace industry should not be emulated by anyone, and is often self destructive. Most of you have probably read various popular articles about the development process that produces the space shuttle software, and while some people might think that the world would be better if all software developers were that “careful”, the truth is that we would be decades behind where we are now, with no PC’s and no public internet if everything was developed at that snail’s pace. This particular anecdote seemed to have some practical value, so I decided to give it a try. The flight control code for the Armadillo rockets is only a few thousand lines of code, so I took the main tic function and started inlining all the subroutines. While I can’t say that I found a hidden bug that could have caused a crash (literally…), I did find several variables that were set multiple times, a couple control flow things that looked a bit dodgy, and the final code got smaller and cleaner.

After living with the code in that style for quite some time, I haven’t found any drawbacks to it, and I have started applying the general approach to my code at Id. In lots of places there is a choice between organizing code a few ways:


------- style A:
 
void MinorFunction1( void ) {
}
 
void MinorFunction2( void ) {
}
 
void MinorFunction3( void ) {
}
 
void MajorFunction( void ) {
        MinorFunction1();
        MinorFunction2();
        MinorFunction3();
}
 
--------- style B:
 
void MajorFunction( void ) {
        MinorFunction1();
        MinorFunction2();
        MinorFunction3();
}
 
void MinorFunction1( void ) {
}
 
void MinorFunction2( void ) {
}
 
void MinorFunction3( void ) {
}
 
 
---------- style C:
 
void MajorFunction( void ) {
        // MinorFunction1
 
        // MinorFunction2
 
        // MinorFunction3 
}

I have historically used “style A” to allow for not prototyping in all cases, although some people prefer “style B”. The difference between the two isn’t of any consequence. Michael Abrash used to write code in “style C”, and I remember actually taking his code and converting it to “style A” in the interest of perceived readability improvements. At this point, I think there are some definite advantages to “style C”, but they are development process oriented, rather than discrete, quantifiable things, and they run counter to a fair amount of accepted conventional wisdom, so I am going to try and make a clear case for it. There isn’t any dogma here, but considering exactly where it is and isn’t appropriate is worthwhile.

In no way, shape, or form am I making a case that avoiding function calls alone directly helps performance.

An exercise that I try to do every once in a while is to “step a frame” in the game, starting at some major point like common->Frame(), game->Frame(), or renderer->EndFrame(), and step into every function to try and walk the complete code coverage. This usually gets rather depressing long before you get to the end of the frame. Awareness of all the code that is actually executing is important, and it is too easy to have very large blocks of code that you just always skip over while debugging, even though they have performance and stability implications.

C++ is no friend of this goal, with operator overloading, implicit constructors, and so on. A lot of things done in the name of flexibility are somewhat misguided, and at the root of a lot of development problems. Games, with the continuously recurring real time tic structure, also have some goals and constraints that encourage a programming style somewhat different than, say, a content creation application or transaction processor.

If something is going to be done once per frame, there is some value to having it happen in the outermost part of the frame loop, rather than buried deep inside some chain of functions that may wind up getting skipped for some reason. For example, our usercmd_t generation code is buried off of asyncServer, and it really should be in the main common loop. Related to this is a topic of hardware versus software design – it is often better to go ahead and do an operation, then choose to inhibit or ignore some or all of the results, than try to conditionally perform the operation. The usercmd_t generation relates to this (and other messes related to the interaction with game over bindsets), where the usercmd_t are only generated when they “need to be”.

The way we have traditionally measured performance and optimized our games encouraged a lot of conditional operations – recognizing that a particular operation doesn’t need to be done in some subset of the operating states, and skipping it. This gives better demo timing numbers, but a huge amount of bugs are generated because skipping the expensive operation also usually skips some other state updating that turns out to be needed elsewhere.

We definitely still have tasks that are performance intensive enough to need optimization, but the style gets applied as a matter of course in many cases where a performance benefit is negligible, but we still eat the bugs. Now that we are firmly decided on a 60hz game, worst case performance is more important than average case performance, so highly variable performance should be looked down on even more.

It is very easy for frames of operational latency to creep in when operations are done deeply nested in various subsystems, and things evolve over time. This can lay hidden as a barely perceptible drop in input quality, or it can be blatantly obvious as a model trailing an attachment point during movement. If everything is just run out in a 2000 line function, it is obvious which part happens first, and you can be quite sure that the later section will get executed before the frame is rendered.

Besides awareness of the actual code being executed, inlining functions also has the benefit of not making it possible to call the function from other places. That sounds ridiculous, but there is a point to it. As a codebase grows over years of use, there will be lots of opportunities to take a shortcut and just call a function that does only the work you think needs to be done. There might be a FullUpdate() function that calls PartialUpdateA(), and PartialUpdateB(), but in some particular case you may realize (or think) that you only need to do PartialUpdateB(), and you are being efficient by avoiding the other work. Lots and lots of bugs stem from this. Most bugs are a result of the execution state not being exactly what you think it is.

Strictly functional functions that only read their input arguments and just return a value without examining or modifying any permanent state are safe from these types of errors, and the nice ability to formally speak about them makes them a good ivory tower topic, but very little of our real code falls into this category. I don’t think that purely functional programming writ large is a pragmatic development plan, because it makes for very obscure code and spectacular inefficiencies, but if a function only references a piece or two of global state, it is probably wise to consider passing it in as a variable. It would be kind of nice if C had a “functional” keyword to enforce no global references.

Const parameters and const functions are helpful in avoiding side effect related bugs, but the functions are still susceptible to changes in the global execution environment. Trying to make more parameters and functions const is a good exercise, and often ends in casting it away in frustration at some point. That frustration is usually due to finding all sorts of places that state could be modified that weren’t immediately obvious – places for bugs to breed.

C++ object methods could be thought of as almost-functional, with an implicit overwriting assignment on return, but with larger objects that contain lots of variables you don’t have much awareness of what is modified by the method, and again, there is no guarantee that the function isn’t running off and doing something horribly global, like parsing a decl.

The function that is least likely to cause a problem is one that doesn’t exist, which is the benefit of inlining it. If a function is only called in a single place, the decision is fairly simple.

In almost all cases, code duplication is a greater evil than whatever second order problems arise from functions being called in different circumstances, so I would rarely advocate duplicating code to avoid a function, but in a lot of cases you can still avoid the function by flagging an operation to be performed at the properly controlled time. For instance, having one check in the player think code for health <= 0 && !killed is almost certain to spawn less bugs than having KillPlayer() called in 20 different places.

On the subject of code duplication, I tracked all the bugs (problems that surfaced after I thought everything was working correctly) I fixed for a while, and I was quite surprised at how often copy-paste-modify operations resulted in subtle bugs that weren’t immediately obvious. For small vector operations on three or four things, I would often just past and modify a few characters like this:

v[0] = HF_MANTISSA(*(halfFloat_t *)((byte *)data + i*bytePitch+j*8+0));
v[1] = HF_MANTISSA(*(halfFloat_t *)((byte *)data + i*bytePitch+j*8+1));
v[2] = HF_MANTISSA(*(halfFloat_t *)((byte *)data + i*bytePitch+j*8+2));
v[3] = HF_MANTISSA(*(halfFloat_t *)((byte *)data + i*bytePitch+j*8+3));

I now strongly encourage explicit loops for everything, and hope the compiler unrolls it properly. A significant number of my bugs related to things like this, and I am now rethinking even two dimensional cases where I commonly use discrete _X, _Y or _WIDTH, _HEIGHT variables. I find that much easier to read than a two element array, but it is difficult to argue with my data on how often it made me screw up. Some practical Matters —

Using large comment blocks inside the major function to delimit the minor functions is a good idea for quick scanning, and often enclosing it in a bare braced section to scope the local variables and allow editor collapsing of the section is useful. I know there are some rules of thumb about not making functions larger than a page or two, but I specifically disagree with that now – if a lot of operations are supposed to happen in a sequential fashion, their code should follow sequentially.

Enclosing pages of code inside a conditional or loop statement does have readability and awareness drawbacks, so leaving that code in a separate function may still be justified, but in some cases it is still possible to move the code to another place where its execution wouldn’t be conditional, or just execute it at all times and inhibit the results in some way with a very small conditional block. An execute-and-inhibit style usually takes more absolute time, but it reduces the variability in frame times, and eliminates a class of bugs.

Inlining code quickly runs into conflict with modularity and OOP protections, and good judgment must be applied. The whole point of modularity is to hide details, while I am advocating increased awareness of details. Practical factors like increased multiple checkouts of source files and including more local data in the master precompiled header forcing more full rebuilds do need to be weighed. Currently I am leaning towards using heavyweight objects as the reasonable break point for combining code, and trying to reduce the use of medium sized helper objects, while making any very lightweight objects as purely functional as possible if they have to exist at all.

To sum up:

If a function is only called from a single place, consider inlining it.

If a function is called from multiple places, see if it is possible to arrange for the work to be done in a single place, perhaps with flags, and inline that.

If there are multiple versions of a function, consider making a single function with more, possibly defaulted, parameters.

If the work is close to purely functional, with few references to global state, try to make it completely functional.

Try to use const on both parameters and functions when the function really must be used in multiple places.

Minimize control flow complexity and “area under ifs”, favoring consistent execution paths and times over “optimally” avoiding unnecessary work.

Discussion?

John Carmack

Braid Particle System Revision (part 1)