The C++ Sucks Series: the quest for the entry point

Suppose you run on the x86 and you don’t like its default FPU settings. For example, you want your programs to dump core when they divide by zero or compute a NaN, having noticed that on average, these events aren’t artifacts of clever numerical algorithm design, but rather indications that somebody has been using uninitialized memory. It’s not necessarily a good idea for production code, but for debugging, you can tweak the x86 FPU thusly:

//this is a Linux header using GNU inline asm
#include <fpu_control.h>
void fpu_setup() {
unsigned short cw;
_FPU_GETCW(cw);
cw &= ~_FPU_MASK_ZM;//Divide by zero
cw &= ~_FPU_MASK_IM;//Invalid operation
_FPU_SETCW(cw);
}

So you call this function somewhere during your program’s initialization sequence, and sure enough, computations producing NaN after the call to fpu_setup result in core dumps. Then one day someone computes a NaN before the call to fpu_setup, and you get a core dump the first time you try to use the FPU after that point. Because that’s how x86 maintains its “illegal operation” flags and that’s how it uses them to signal exceptions.

The call stack you got is pretty worthless as you’re after the context that computed the NaN, not the context that got the exception because it happened to be the first one to use the FPU after the call to fpu_setup. So you move the call to fpu_setup to the beginning of main(), but help it does not. That’s because the offending computation happens before main, somewhere in the global object construction sequence. The order of execution of the global object constructors is undefined by the C++ standard. So if you kindly excuse my phrasing - where should we shove the call to fpu_setup?

If you have enough confidence in your understanding of the things going on (as opposed to entering hair-pulling mode), what you start looking for is the REAL entry point. C++ is free to suck and execute parts of your program in “undefined” (random) order, but a computer still executes instructions in a defined order, and whatever that order is, some instructions ought to come first. Since main() isn’t the real entry point in the sense that stuff happens before main, there ought to be another function which does come first.

One thing that could work is to add a global object to each C++ translation unit, and have its constructor call fpu_setup(); one of those calls ought to come before the offending global constructor - assuming that global objects defined in the same translation unit will be constructed one after another (AFAIK in practice they will, although in theory the implementation could, for example, order the constructor calls by the object name, so they wouldn’t). However, this can get gnarly for systems with non-trivial build process and/or decomposition into shared libraries. Another problem is that compilers will “optimize away” (throw away together with the side effects, actually) calls to constructors of global objects which aren’t “used” (mentioned by name). You can work around that by generating code “using” all the dummy objects from all the translation units and calling that “using” code from, say, main. Good luck with that.

The way I find much easier is to not try to solve this “portably” by working against the semantics prescribed by the C++ standard, but instead rely on the actual implementation, which usually has a defined entry point, and a bunch of functions known to be called by the entry point before main. For example, the GNU libc uses a function called __libc_start_main, which is eventually called by the code at _start (the “true” entry point containing the first executed instruction, AFAIK; I suck at GNU/Linux and only know what was enough to get by until now.) In general, running `objdump -T <program> | grep start` (which looks for symbols from shared libraries - “nm <program>” will miss those) is likely to turn up some interesting function. In these situations, some people prefer to find out from the documentation, others prefer to crawl under a table and die of depression; the grepping individuals of my sort are somewhere in between.

Now, instead of building (correctly configure-ing and make-ing) our own version of libc with __libc_start_main calling the dreaded fpu_setup, we can use $LD_PRELOAD - an env var telling the loader to load our library first. If we trick the loader into loading a shared library containing the symbol __libc_start_main, it will override libc’s function with the same name. (I’m not very good at dynamic loading, but the sad fact is that it’s totally broken, under both Windows and Unix, in the simple sense that where a static linker would give you a function redefinition error, the dynamic loader will pick a random function of the two sharing a name, or it will call one of them from some contexts and the other one from other contexts, etc. But if you ever played with dynamic loading, you already know that, so enough with that.)

Here’s a __libc_start_main function calling fpu_setup and then the actual libc’s __libc_start_main:

#include <dlfcn.h>

typedef int (*fcn)(int *(main) (int, char * *, char * *), int argc, char * * ubp_av, void (*init) (void), void (*fini) (void), void (*rtld_fini) (void), void (* stack_end));
int __libc_start_main(int *(main) (int, char * *, char * *), int argc, char * * ubp_av, void (*init) (void), void (*fini) (void), void (*rtld_fini) (void), void (* stack_end))
{
fpu_setup();
void* handle = dlopen("/lib/libc.so.6", RTLD_LAZY | RTLD_GLOBAL);
fcn start = (fcn)dlsym(handle, "__libc_start_main");
(*start)(main, argc, ubp_av, init, fini, rtld_fini, stack_end);
}

Pretty, isn’t it? Most of the characters are spent on spelling the arguments of this monstrosity - not really interesting since we simply propagate whatever args turned up by grepping/googling for “__libc_start_main” to the “real” libc’s __libc_start_main. dlopen and dlsym give us access to that real __libc_start_main, and /lib/libc.so.6 is where my Linux box keeps its libc (I found out using `ldd <program> | grep libc`).

If you save this to a fplib.c file, you can use it thusly:

gcc -o fplib.so -shared fplib.c
env LD_PRELOAD=./fplib.so <program>

And now your program should finally dump core at the point in the global construction sequence where NaN is computed.

This approach has the nice side-effect of enabling you to “instrument” unsuspecting programs without recompiling them s.t. they run with a reconfigured FPU (to have them crash if they compute NaNs, unless of course they explicitly configure the FPU themselves instead of relying on what they get from the system.) But there are niftier applications of dynamic preloading, such as valgrind on Linux and .NET on Windows (BTW, I don’t know how to trick Windows into preloading, just that you can.) What I wanted to illustrate wasn’t how great preloading is, but the extent to which C++, the language forcing you to sink that low just to execute something at the beginning of your program, SUCKS.

Barf.

Corrections - thanks to the respective commenters for these:

1. Section 3.6.2/1 of the ISO C++ standard states, that “dynamically initialized [objects] shall be initialized in the order in which their definition appears in the translation unit”. So at least you have that out of your way if you want to deal with the problem at the source code level.

2. Instead of hard-coding the path to libc.so, you can pass RTLD_NEXT to dlsym.

The internal free market

This is going to be a bit atypical, because I’m going to talk, like, about organizing large teams of programmers. Which I rarely do, for the simple reason that it’s not my problem. I’m not a manager, I don’t think I’m likely to do a particularly good job as a manager in the near future, and I don’t want to be a manager. As far as I’m concerned - if your problem is organizing lots of people, you brought it upon yourself. So this “internal free market” thing, which tends to work well according to my observations, is an exception to my general rule of not making or thinking much about “organizational” observations.

So, free markets. Basically a way to create incentives (because you have to compete) at the cost of redundancy (because of duplicated efforts by many competitors). A redundancy vs dependencies issue, if you like - several competitors means less dependence on each - and since I generally think redundancy is a fair price to pay for removing dependencies, you can guess that I’m leaning towards free market fundamentalism.

At this stage I’m skipping the detours where I’m dragging exciting pseudo-related stuff into this, like the subprime mortgage crisis and the enigmatically overwhelming support for Barack Obama by top programmers and tech bloggers. I’m skipping that to get to the simple point of there being no good way for an employer to create incentives for programmers with money.

How exactly this trick fails to work, and what kind of LOCs you get when you pay by the LOC is out of my scope. If you’re sincerely surprised, there’s lots of material for you to browse - Econ 101 Management being as good a place to start as any. The single thing I have to say about the “financial incentive” method here is that its failure isn’t at all surprising to the free market fundamentalist.

In a free market, people solve their own problems, and (pay to) use the output of other people only when it helps them solve their problems, in a completely distributed way. Setting prices for things and subsidizing them is the trademark of a centralized, controlled economy. Now, does an employer paying by the LOC or by 1/#defects or whatever create an “internal free market”, or an “internal government bureaucracy”? Without looking at soft stuff, like a hypothetical offense to sacred engineering values carried by the act of creating incentives, we can see that subsidizing LOCs will create surplus LOCs, just the way it works with agricultural surplus and everywhere else.

What would we do if we wanted a real internal free market? “Free market” means that we want to have people solving their problems, and let them “pay” each other in some way to solve them, without us controlling the latter process. In our specific context:

  1. “People” means “strong programmers” (or at least decent - or else why did “we”, the employer, hire them, dammit?) Folks who, at the very least, like to be productive and have their stuff used. Maybe they also “like to solve puzzles”, but not all do. For example, I hate puzzles, and have a strong preference for Alexandrian solutions. But you should still hire me.
  2. For those people, “solving their problems” means delivering user-visible features. This is the basic responsibility of developers towards the organization, this is what the organization is capable of judging (it better be), and this is what links the whole operation to reality through the external market forces.

The only question is, what does it mean for a developer to “pay” another developer? Paying with money makes no sense, not with our definition of “people”. People who’re into those transactions tend to be self-employed. However, developers do have their own currency, karma points or whatever term you prefer to use for it (they are all irksome; economics is to life what proctology is to anatomy - it’s ugly because it’s true). I know two kinds:

  1. Time. You can “pay” to developers, teams and the whole organization by volunteering to do particularly unsatisfying but important/urgent work.
  2. Code. When someone uses your code, they are paying you (I repeat, it’s not you who are giving something to them; do not make this error or they’ll stop using your code.)

Time and code are not unlike gold and printed money, because you can’t make more time but you can make more code. However, proceeding with this analogy and trying to scale it to include inflation and such will expose my economical illiteracy and general lunacy to an extent making me want to stop now.

What we’ll do now is examine how “trading” time and code works in programming, and how it creates incentives to invest efforts into the most needed things similarly to the way prices do in free market economies. We’ll start with “trading code” - the less intuitive but the more fundamental kind of transaction.

You have to deliver something user-visible. The user couldn’t care less about the guts making up your program - how you parse things, what your communication protocols are, which optimized libraries you use for processing bottlenecks, etc. However, you do care about these things, because they are really needed for things to work. With all these things, you could reuse “infrastructure” others work on (for example, me), or you could roll your own (let’s ignore “international trade” where you use a third-party library for the moment).

To you, depending on me is a risk - who knows how many bugs I have, how well my stuff maps to your needs, etc. To me, on the other hand, having you reuse my code is the best thing that can happen to me during work hours. After work, better things can happen, in particular, those having to do with spending the money earned during work hours. But at work, the best thing that I can do is be productive and have others use my code. Lots of users is the workplace fortune equivalent to being rich in the real world. Do you see who pays who here?

What can an organization do to manage these infrastructure transactions?

  1. The “economical”, “capitalist” solution: leave them alone except for securing them. “Leaving them alone” means not controlling them - not mandating the development and reuse of infrastructure and not assigning workforce to it. This means that by making my modules reusable, I’m only trying to please my internal users, so I’m likely to (try to) invest most effort into what they find important and helpful for doing their ultimate job. “Securing transactions” means something similar to the way public companies are forced to expose their accounting. If something becomes reusable code, it ought to have proper documentation, versioning, etc., and the organization must make sure it does.
  2. The “political”, “socialist” solution: assign the task of developing a parser, an optimized library, etc. to a person/team - subsidize the parser (the price to a user is now lower - even if the parser is all buggy, a person is officially assigned to fix the bugs, and the responsibility for failures moves to that person and not to the one who decided to reuse the code). This means that the parser will be created even if in a “free company” nobody would want to develop and maintain it, knowing that most people wouldn’t take the risk of using it for the benefits it provides. Leading to surplus crops of parsers.
  3. A further improvement on 2, the “communist” solution: force everyone to use The Parser. This means there are no “economical” means to punish the author whatsoever - where “punishing” means “not paying” and “not paying” means “stop using the code”. However, there’s still hope: you have political means to punish the author. For example, poke fun at the goddamn nightmare infrastructure, yell at the author, yell at his manager, ask your manager to yell at his manager’s manager - a whole slew of counter-infrastructure measures. Victims of infrastructural communism use them all the time.

So this is how “trading code” is (more accurately, “can be”) a better way of evolving reusable “infrastructure” than centralized planning. In general, the only thing I’m discussing is the reusable stuff - that’s what organizations can optimize (or pessimize, creating useless “reusable” modules and not creating the actually needed ones). Nothing can be done about things which aren’t reusable by definition, belonging to a single “feature”/”project” - those will have to be written once and only once no matter what.

What’s wrong with this picture? Could be many things, but one thing I’ll talk about (because I have a good answer for it) is the problem of “instant gratification”/”disruptive changes”/”local optimums”/etc. There are grand things that just can’t be done by small incremental changes, the by-products of work on “specific features”. You really need a person/team assigned to these things. This is somewhat similar to economies of scale which can be achieved by purchasing expensive machinery. How are many small farmers/shoe makers/etc. going to raise money for that machinery without central planning, if they’re all busy with their small short-term profits?

This is where entrepreneurs come into play. Entrepreneurs are people with fire up their asses. Normal people want enough money to get by, enough money to not worry about money, or enough money to not have to work for money. Entrepreneurs want more money than they can sensibly spend during the decades of their lifetime. And they want it because they desperately need that money to feed the fire raging up their asses. When they see a potential for making truckloads of money, many of them are willing to put their own savings on the line to chase that chance.

This psychological profile is a speculation of mine - my best attempt to comprehend the inexplicable behavior of making efforts and burning nerve endings to make more money than you could possibly need. However, I do have motivation which is quite similar in the context of our “economics of programming” analogy. I’m a “programming entrepreneur”, or at least I have, um, the same trademark proctological fireworks. I’m thrilled by opportunities to make stuff that, like, everybody will use, everything will depend on, …and everyone will want a piece of me when it breaks - so? It’s still worth it.

I can’t make such stuff as a by-product of working on something reasonably user-visible. I need to be assigned to it. What are the savings that I can put on the line? Time invested into doing unsatisfying, but important work. I call my own way of making these deals “buying development with debugging”. I’m usually willing to debug the weirder of the urgent problems, although it’s not much fun by itself, because it translates to a lot of karma points. I can then spend those karma points by working on what I want 80% of the time, 20% being the continuous urgent debugging tax.

Again, there’s more than one way for that kind of “entrepreneur” to start a programming venture:

  1. The “economical” way - spend my own time implementing my ideas. Like a “real” entrepreneur putting his savings on the line and forced to look at his company bleeding that money if it doesn’t take off, I will want to stop as soon as possible when I realize that I’m failing. Those so-called “organizational karma points” you gain in the trenches have better uses than wasting on the development of worthless stuff nobody will use.
  2. The “political” way - convincing “the government” (a manager) that my idea is worth implementing, and have someone assigned to it. Now nobody wants to admit the failure early on. I’m not losing anything when someone else struggles with the implementation - “I could do it better”. The person working on the thing isn’t really held responsible for the failure, either - not his idea, so why not keep trying to make it work? Everybody wants to make his stuff work and be used, after all. And the manager won’t want to admit the failure because of all people, he’ll get most of the blame. Therefore, the worthless effort will not be stopped for a lot of time.

Free market supporters are sometimes blamed for disrespecting people and reducing human nature to primitive egoism. Well, the only thing I can say is that I sure am a Good Person (how could it be different?), I respect myself lots, I successfully “launched” more than one “programming venture” both ways - “economical” (DIY) and “political” (persuasion), and of each of these two kinds, some succeeded and some failed.

And believe you me, deep down I refuse to take responsibility for the failing “politically launched” projects even now when we talk about it. On the other hand, the “economically launched” failures are - seriously - the best thing that happened to me in my professional life. I attribute most of my occasional successes - or, more accurately, non-failures - to lessons learned from the DIY failures, which I had no choice but admit responsibility for. (Damn, that was painful. To the extent that wasn’t on my job description.)

Now, I’m not an “internal free market fundamentalist”, simply because I know much more about programming than I do about economics, and obnoxious/oversimplified opinions usually correlate with ignorance. However, my experience seems to show that “internal free markets” are healthy enough to sustain continuous improvements on many scales, and eventually punish both “greedy” “instant gratification” techniques of pleasing managers/customers and architectural masturbation, promoting solid work.

And if you’re not a manager (I mostly care about non-managers, guys and gals like me, you know), I think this quasi-economical angle can contribute to your ability to look at some young initiative around you and say “Hm, this might work out” and conversely “Epic fail on the way, I’m not going to touch this with a laser pointer, man”. So, FYI.

Consistency: how to defeat the purpose of IEEE floating point

I don’t know much about the design of IEEE floating point, except for the fact that a lot of knowledge and what they call “intellectual effort” went into it. I don’t even know the requirements, and I suspect those were pretty detailed and complex (for example, the benefits of having a separate representation for +0 and -0 seem hard to grasp unless you know about the very specific and hairy examples in the complex plane). So I don’t trust my own summary of the requirements very much. That said, here’s the summary: the basic purpose of IEEE floating point is to give you results of the highest practically possible precision at each step of your computation.

I’m not going to claim this requirement is misguided, because I don’t feel like arguing with people two orders of magnitude more competent than myself who have likely faced much tougher numerical problems than I’ve ever seen. What I will claim is that differences in numerical needs divide programmers into roughly three camps, and the highest-possible-precision approach hurts one of them really badly, and so has to be worked around in ways we’ll discuss. The camps are:

  1. The huge camp of people who do businessy accounting. Those should work with integral types to get complete, deterministic, portable control over rounding and all that. Many of the clueless people in this camp represent 1 dollar and 10 cents as the floating point number 1.1. While they are likely a major driving force behind economical growth, I still think they deserve all the trouble they bring upon themselves.
  2. The tiny camp doing high-end scientific computing. Those are the people who can really appreciate the design of IEEE floating point and use its full power. It’s great that humanity accidentally satisfied the needs of this small but really cool group, making great floating point hardware available everywhere through blind market forces. It’s like having a built-in Stradivari in each home appliance. Yes, perhaps I exaggerate; I get that a lot.
  3. The sizable camp that deals with low-end to mid-range semi-scientific computing. You know, programs that have some geometry or physics or algebra in them. 99.99% of the code snippets in that realm work great with 64b floating point, without the author having invested any thought at all into “numerical analysis”. 99% of the code works with 32b floats. When someone stumbles upon a piece of code in the 1% and actually witnesses fatal precision loss, everybody gathers to have a look as if they heard about a beautiful rainbow or a smoke suggesting a forest fire near the horizon.

The majority of people who use and actually should be using floating point are thus in camp 3. Those people don’t care about precision anywhere near camp 2, nor do they know how to make the best of IEEE floating point in the very unlikely circumstances where their naive approach will actually fail to work. What they do care about though is consistency. It’s important that things compute the same on all platforms. Perhaps more importantly for most, they should compute the same under different build settings, most notably debug and release mode, because otherwise you can’t reproduce problems.

Side note: I don’t believe in build modes; I usually debug production code in release mode. It’s not just floating point that’s inconsistent across modes - it’s code snippets with behavior undefined by the language, buggy dependence on timing, optimizer bugs, conditional compilation, etc. Many other cans of worms. But the fact is that most people have trouble debugging optimized code, and nobody likes it, so it’s nice to have the option to debug in debug mode, and to do that, you need things to reproduce there.

Also, comparing results of different build modes is one way to find worms from those other cans, like undefined behavior and optimizer bugs. Also, many changes you make are optimizations or refaptorings and you can check their sanity by making sure they didn’t change the results of the previous version. As we’ll see, IEEE FP won’t give you even that, regardless of platforms and build modes. The bottom line is that if you’re in camp 3, you want consistency, while the “only” things you can expect from IEEE FP is precision and speed. Sure, “only” should be put in quotes because it’s a lot to get, it’s just a real pity that fulfilling the smaller and more popular wish for consistency is somewhere between hard and impossible.

Some numerical analysts seem annoyed by the camp 3 whiners. To them I say: look, if IEEE FP wasn’t the huge success that it is in the precision and speed departments, you wouldn’t be hearing from us because we’d be busy struggling with those problems. What we’re saying is the exact opposite of “IEEE FP sucks”. It’s “IEEE FP is so damn precise and fast that I’m happy with ALL of its many answers - the one in optimized x86 build, the one in debug PowerPC build, the one before I added a couple of local variables to that function and the one I got after that change. I just wish I consistently got ONE of these answers, any of them, but the same one.” I think it’s more flattering than insulting.

I’ve accumulated quite some experience in defeating the purpose of IEEE floating point and getting consistency at the (tiny, IMO) cost of precision and speed. I want to share this knowledge with humanity, with the hope of getting rewarded in the comments. The reward I’m after is a refutation of my current theory that you can only eliminate 95%-99% of the pain automatically and have to solve the rest manually each time it raises its ugly head.

The pain breakdown

I know three main sources of floating point inconsistency pain:

  1. Algebraic compiler optimizations
  2. “Complex” instructions like multiply-accumulate or sine
  3. x86-specific pain not available on any other platform; not that ~100% of non-embedded devices is a small market share for a pain.

The good news is that most pain comes from item 3 which can be more or less solved automatically. For the purpose of decision making (”should we invest energy into FP consistency or is it futile?”), I’d say that it’s not futile and if you can cite actual benefits you’d get from consistency, than it’s worth the (continuous) effort.

Disclaimer: I only discuss problems I know and to the extent of my understanding. I have no solid evidence that this understanding is complete. Perhaps the pain breakdown list should have item 4, and perhaps items 1 to 3 have more meat than I think. And while I tried to get the legal stuff right - which behavior conforms to IEEE 754, which conforms to C99, and which conforms to nothing but is still out there - I’m generally a weak tech lawyer and can be wrong. I can only give you the “worked on my 4 families of machines” kind of warranty.

Algebraic compiler optimizations

Compilers, or more specifically buggy optimization passes, assume that floating point numbers can be treated as a field - you know, associativity, distributivity, the works. This means that a+b+c can be computed in either the order implied by (a+b)+c or the one implied by a+(b+c). Adding actual parentheses in source code doesn’t help you one bit. The compiler assumes associativity and may implement the computation in the order implied by regrouping your parentheses. Since each floating point operation loses precision on some of the possible inputs, the code generated by different optimizers or under different optimization settings may produce different results.

This could be extremely intimidating because you can’t trust any FP expression with more than 2 inputs. However, I think that programming languages in general don’t allow optimizers to make these assumptions, and in particular, the C standard doesn’t (C99 §5.1.2.3 #13, didn’t read it in the document but saw it cited in two sources). So this sort of optimization is actually a bug that will likely be fixed if reported, and at any given time, the number of these bugs in a particular compiler should be small.

I only recall one recurring algebraic optimization problem. Specifically, a*(b+1) tended to be compiled to a*b+a in release mode. The reason is that floating-point literal values like 1 are expensive; 1 becomes a hairy hexadecimal value that has to be loaded from a constant address at run time. So the optimizer was probably happy to optimize a literal away. I was always able to solve this problem by changing the spelling in the source code to a*b+a - the optimizer simply had less work to do, while the debug build saw no reason to make me miserable by undoing my regrouping back into a*(b+1).

This implies a general way of solving this sort of problem: find what the optimizer does by looking at the generated assembly, and do it yourself in the source code. This almost certainly guarantees that debug and release will work the same. With different compilers and platforms, the guarantee is less certain. The second optimizer may think that the optimization you copied from the first optimizer into your source code is brain-dead, and undo it and do a different optimization. However, that means you target two radically different optimizers, both of which are buggy and can’t be fixed in the near future; how unlucky can you get?

The bottom line is that you rarely have to deal with this problem, and when it can’t be solved with a bug report, you can look at the assembly and do the optimization in the source code yourself. If that fails because you have to use two very different and buggy compilers, use the shotgun described in the next item.

“Complex” instructions

Your target hardware can have instructions computing “non-trivial” expressions beyond a*b or a+b, such as a+=b*c or sin(x). The precision of the intermediate result b*c in a+=b*c may be higher than the size of an FP register would allow, had that result been actually stored in a register. IEEE and the C standard think it’s great, because the single instruction generated from a+=b*c is both faster and more precise than the 2 instructions implementing it as d=b*c, a=a+d. Camp 3 people like myself don’t think it’s so great, because it happens in some build modes but not others, and across platforms the availability of these instruction varies, as does their precision.

AFAIK the “contraction” of a+=b*c is permitted by both the IEEE FP standard (which defines FP + and *) and the C standard (which defines FP types that can map to standards other than IEEE). On the other hand, sin(x), which also gets implemented in hardware these days, isn’t addressed by either standard - to the same effect of making the optimization perfectly legitimate. So you can’t solve this by reporting a bug the way you could with algebraic optimizations. The other way in which this is tougher is that tweaking the code according to the optimizer’s wishes doesn’t help much. AFAIK what can help is one of these two things:

  1. Ask the compiler to not generate these instructions. Sometimes there’s an exact compiler option for that, like gcc’s platform-specific -mno-fused-madd flag, or there’s (a defined and actually implemented) pragma directive such as #pragma STDC FP_CONTRACT. Sometimes you don’t have any of that, but you can lie to the compiler that you’re using an older (compatible) revision of the processor architecture without the “complex” instructions. The latter is an all-or-nothing thing enabling/disabling lots of stuff, so it can degrade performance in many different ways; you have to check the cost.
  2. If compiler flags can’t help, there’s the shotgun approach I promised to discuss above, also useful for hypothetical cases of hard-to-work-around algebraic optimizations. Instead of helping the optimizer, we get in its way and make optimization impossible using separate compilation. For example, we can convert a+=b*c to a+=multiply_dammit(b,c); multiply_dammit is defined in a separate file. This makes it impossible for most optimizers to see the expression a+=b*c, and forces them to implement multiplication and addition separately. Modern compilers support link-time inlining and then they do optimize the result as a whole. But you can disable that option, and as a side effect speed up linkage a great deal; if that seriously hurts performance, your program is insane and you’re a team of scary ravioli coders.

The trouble with the shotgun approach, aside from its ugliness, is that you can’t afford to shoot at the performance-critical parts of your code that way. Let us hope that you’ll never really have to choose between FP consistency and performance, as I’ve never had to date.

x86

Intel is the birthplace of IEEE floating point, and the manufacturer of the most camp-3-painful and otherwise convoluted FP hardware. The pain comes, somewhat understandably, from a unique commitment to the IEEE FP philosophy - intermediate results should be as precise as possible; more on that in a moment. The “convoluted” part is consistent with the general insanity of the x86 instruction set. Specifically, the “old” (a.k.a “x87″) floating point unit uses a stack architecture for addressing FP operands, which is pretty much the exact opposite of the compiler writer’s dream target, but so is the rest of x86. The “new” floating point instructions in SSE don’t have these problems, at the cost of creating the aesthetic/psychiatric problem of actually having two FP ISAs in the same processor.

Now, in our context we don’t care about the FP stack thingie and all that, the only thing that matters is the consistency of precision. The “old” FP unit handles precision thusly. Precision of stuff written to memory is according to the number of bits of the variable, ’cause what else can it be. Precision of intermediate results in the “registers” (or the “FP stack” or whatever you call it) is defined according to the FPU control & status register, globally for all intermediate results in your program.

By default, it’s 80 bits. This means that when you compute a*b+c*d and a,b,c,d are 32b floats, a*b and c*d are computed in 80b precision, and then their 80b sum is converted to a 32b result in memory (if a*b+c*d is indeed written to memory and isn’t itself an “intermediate” result). Indeed, what’s “intermediate” in the sense of not being written to memory and what isn’t? That depends on:

  1. Debug/release build. If we have “float e=a*b+c*d”, and e is only used once right in the next line, the optimizer probably won’t see a point in writing it to memory. However, in a debug build there’s a good reason to allocate it in memory, because if you single-step your program and you’re already past the line that used e, you still might want to look at the value of e, so it’s good that the compiler kept a copy of it for the debugger to find.
  2. The code “near” e=a*b+c*d according to the compiler’s notion of proximity also affects its decisions. There are only so many registers, and sometimes you run out of them and have to store things in memory. This means that if you add or remove code near the line or in inline functions called near the line, the allocation of intermediate results may change.

Compilers could have an option asking them to hide this mess and give us consistent results. The problems with this are that (1) if you care about cross-platform/compiler consistency, then the availability of cross-mode consistency options in one compiler doesn’t help with the other compiler and (2) for some reason, compilers apparently don’t offer this option in a very good way. For example, MS C++ used to have a /fltconsistency switch but seems to have abandoned it in favor of an insane special-casing of the syntax float(a*b)+float(c*d) - that spelling forces consistency (although the C++ standard doesn’t assign it a special meaning not included in the plain and sane a*b+c*d).

I’d guess they changed it because of the speed penalty it implies rather than the precision penalty as they say. I haven’t heard about someone caring both about consistency and that level of precision, but I did hear that gcc’s consistency-forcing -ffloat-store flag caused notable slowdowns. And the reason it did is implied by its name - AFAIK the only way to implement x86 FP consistency at compile time is to generate code storing FP values to memory to get rid of the extra precision bits. And -ffloat-store only affects named variables, not unnamed intermediate results (annoying, isn’t it?), so /fltconsistency, assuming it actually gave you consistency of all results, should have been much slower. Anyway, the bottom line seems to be that you can’t get much help from compilers here; deal with it yourself. Even Java gave up on its initial intent of getting consistent results on the x87 FPU and retreated to a cowardly strictfp scheme.

And the thing is, you never have to deal with it outside of x86 - all floating point units I’ve come across, including the ones specified by Intel’s SSE and SSE2, simply compute 32b results from 32b inputs. People who decided to do it that way and rob us of quite some bits of precision have my deepest gratitude, because there’s absolutely no good way to work around the generosity of the original x86 FPU designers and get consistent results. Here’s what you can do:

  1. Leave the FP CSR configured to 80b precision. 32b and 64b intermediate results aren’t really 32b and 64b. The fact that it’s the default means that if you care about FP result consistency, intensive hair pulling during your first debugging sessions is an almost inevitable rite of passage.
  2. Set the FP CSR to 64b precision. If you only use 64b variables, debug==release and you’re all set. If you have 32b floats though, then intermediate 32b results aren’t really 32b. And usually you do have 32b floats.
  3. Set the FP CSR to 32b precision. debug==release, but you’re far from “all set” because now your 64b results, intermediate or otherwise, are really 32b. Not only is this a stupid waste of bits, it is not unlikely to fail, too, because 32b isn’t sufficient even for some of the problems encountered by camp 3. And of course it’s not compatible with other platforms.
  4. Set the FP CSR to 64b precision during most of the program run, and temporarily set it to 32b in the parts of the program using 32b floats. I wouldn’t go down that path; option 4 isn’t really an option. I doubt that you use both 32b and 64b variables in a very thoughtful way and manage to have a clear boundary between them. If you depend on the ability of everyone to correctly maintain the CSR, then it sucks sucks sucks.

Side note: I sure as hell don’t believe in “very special” “testing” build/running modes. For example, you could say that you have a special mode where you use option (3) and get 32b results, and use that mode to test debug==release or something. I think it’s completely self-defeating, because the point of consistency is being able to reproduce a phenomenon X that happens in a mode which is actually important, in another mode where reproducing X is actually useful. Therefore, who needs consistency across inherently useless modes? We’d be defeating the purpose of defeating the purpose of IEEE floating point.

Therefore, if you don’t have SSE, the only option is (2) - set the FP CSR to 64b and try to avoid 32b floats. On Linux, you can do it with:

#include <fpu_control.h>
fpu_control_t cw;
_FPU_GETCW(cw);
cw = (cw & ~_FPU_EXTENDED) | _FPU_DOUBLE;
_FPU_SETCW(cw);

Do it first thing in main(). If you use C++, you should do it first thing before main(), because people can use FP in constructors of global variables. This can be achieved by figuring out the compiler-specific translation unit initialization order, compiling your own C/C++ start-up library, overriding the entry point of a compiled start-up library using stuff like LD_PRELOAD, overwriting it in a statically linked program right there in the binary image, having a coding convention forcing to call FloatingPointSingleton::instance() before using FP, or shooting the people who like to do things before main(). It’s a trade-off.

The situation is really even worse because the FPU CSR setting only affects mantissa precision but not the exponent range, so you never work with “real” 64b or 32b floats there. This matters in cases of huge numbers (overflow) and tiny numbers (double rounding of subnormals). But it’s bad enough already, and camp 3 people don’t really care about the extra horror; if you want those Halloween stories, you can find them here. The good news are that today, you are quite likely to have SSE2 and very likely to have SSE on your machine. So you can automatically sanitize all the mess as follows:

  1. If you have SSE2, use it and live happily ever after. SSE2 supports both 32b and 64b operations and the intermediate results are of the size of the operands. BTW, mixed expressions like a+b where a is float and b is double don’t create consistency problems on any platform because the C standard specifies the rules for promotion precisely and portably (a will be promoted to double). The gcc way of using SSE2 for FP is -mfpmath=sse -msse2.
  2. If you only have SSE, use it for 32b floats which it does support (gcc: -mfpmath=sse -msse). 64b floats will go to the old FP unit, so you’ll have to configure it to 64b intermediate results. Everything will work, the only annoying things being (1) the retained necessity to shoot the people having fun before main and (2) the slight differences in the semantics of control flags in the old FP and the SSE FP CSR, so if you configure your own policy, floats and doubles will not behave exactly the same. Neither problem is a very big deal.

Interestingly, SSE with its support for SIMD FP commands actually can make things worse in the standard-violating-algebraic-optimizations department. Specifically, Intel’s compiler reportedly has (had?) an optimization which unrolls FP accumulation loops and reorders additions in order to utilize SIMD FP commands (gcc 4 does that, too, but only if you explicitly ask for trouble with -funsafe-math-optimizations or similar). But I wouldn’t conclude anything from it, except that automatic vectorization, which is known to work only on the simplest of code snippets, actually doesn’t work even on them.

Summary: use SSE2 or SSE, and if you can’t, configure the FP CSR to use 64b intermediates and avoid 32b floats. Even the latter solution works passably in practice, as long as everybody is aware of it.

I think I covered everything I know except for things like long double, FP exceptions, etc. - and if you need that, you’re not in camp 3; go away and hang out with your ivory tower numerical analyst friends. If you know a way to automate away more pain, I’ll be grateful for every FP inconsistency debugging afternoon your advice will save me.

Happy Halloween!

Off topic

  1. To comment, you no longer need to register, just type “y” to confirm you’re a human. Thanks to Aristotle Pagaltzis for pointing out that the previous arrangement sucked.
  2. I’ve started another blog, mostly hosting images. For example:

I originally intended to have one blog for everything, but since you’ve probably subscribed for the technobabble, I’ll reserve the channel for that.

I want a struct linker

Here’s a problem I’ve seen a lot (it’s probably classified as an “Antipattern” or a “Code Smell” and as such has a name in the appropriate circles, but I wouldn’t know, so I’ll leave it nameless).

You have some kind of data structure that you pass around a lot. Soon, the most valuable thing about the structure isn’t the data it keeps, but the fact that it’s available all the way through some hairy flow of control. If you want to have your data flow through all those pipes, just add it to The Data Structure. (To antipattern classification enthusiasts: I don’t think we have a god object yet because we really want to pass our data through that flow and it’s right to have one kind of structure for that and not, say, propagating N+1 function parameters.)

Now suppose the structure holds an open set of data. For example, a spam filter could have a data structure to which various passes add various cues they extract from the message, and other passes can access those cues. We don’t want the structure to know what passes exist and what cues they extract, so that you can add a pass without changing the structure.

I don’t think there’s a good way to do it in a 3GL. In C or C++, you can:

  • Aggregate the cue structures by value (which means you have to recompile everything once you change/add/remove a member from any of them)
  • Keep pointers to the cue structures and use forward declarations to avoid recompilation (a bit slower, and you still have to recompile when you add/remove a whole cue structure)
  • Keep an array of void* or base class objects (not debugger-friendly, and requires a registration procedure to resize the arrays according to the number of passes and deal dynamically computed indexes to the cues to all who wish to access them)
  • Keep a key -> void* map (increasingly slow and debugger-unfriendly, and you need registration to compute the keys from cue names, or use the C substitute for interning - use pointers to global variables with names like &g_my_cue_key as keys)
  • Keep a string -> void* map (no registration or pseudo-interning, but really slow)

On top of JVM or .NET, you have pretty much the same options, plus the option to generate the cue container structure dynamically. Each cue would define an interface and the container structure would implement all those interfaces. The debugger would display containers nicely, and the code accessing them wouldn’t depend on the container class. I’d guess nobody does that though because the class generation part is likely somewhat gnarly.

In a 4GL, you can add attributes to class objects at run time. This is similar to keeping a key->pointer map in a 3GL, except the name interning is handled by the system as it should, and you don’t confuse debuggers because you’re using a standard feature of the object system. Which solves everything except for the speed issue, which is of course dwarfed by other 4GL speed issues.

Now, I used to think of it as one of the usual speed vs convenience trade-offs, but I no longer think it is, because a struct linker could solve it.

Suppose you could have “distributed” struct/class definitions in an offset-based language; you could write “dstruct SpamCues { ViagraCue viagra; CialisCue cialis; }” in the Medication spam filter module, and “dstruct SpamCues { FallicSymbolsCue fallic; SizeDescriptionsCue size; }” in the Penis Enlargement module. The structure is thus defined by all modules linked into the application.

When someone gets a SpamCues structure and accesses cues.viagra, the compiler generates a load-from-pointer-with-offset instruction - for example, in MIPS assembly it’s spelled “lw offset(ptrreg)”. However, the offset would be left for the linker to resolve, just the way it’s done today for pointers in “move reg, globalobjectlabel” and “jump globalfunclabel”.

This way, access to “distributed” structures would be as fast as “normal” structures. And you would preserve most optimizations related to adjacent offsets. For example, if your machine supports multiple loads, so a rectangle structure with 4 int members can be loaded to 4 registers with “ldm rectptrreg,{R0-R4}” or something, it could still be done because the compiler would know that the 4 members are adjacent; the only unknown thing would be the offset of the rectangle inside the larger struct.

One issue the linker could have on some architectures is handling very large offsets that don’t fit into the instruction encoding of load-from-pointer-with-offset forms. Well, I’d live even with the dumbest solution where you always waste an instruction to increment a pointer in case the offset is too large. And then you could probably do better than that, similarly to the way “far calls” (calls to functions at addresses too far from the point of call for the offset to fit into 28 bits or whatever the branch offset encoding size is on your machine) are handled today.

The whole thing can fail in presence of dynamic loading during program run as in dlopen/LoadLibrary; if you already have objects of the structure, and your language doesn’t support relocation because of using native pointers, then the dynamically loaded module won’t be able to add members to a dstruct since it can’t update the existing objects. Well, I can live with that limitation.

If the language generates native object files, there’s the problem of maintaining compatibility with the object file format. I think this could “almost” be done, by mapping a distributed structure to a custom section .dstruct.SpamCues, and implementing members (viagra, cialis, fallic, size) as global objects in that section. Then if an equivalent of a linker script says that the base address of .dstruct.SpamCues is 0, then &viagra will resolve to the offset of the member inside the structure. The change to automatically map sections named .dstruct.* to 0 surely isn’t more complicated than the handling of stuff like .gnu.linkonce, inflicted upon us by the idiocy of C++ templates and the likes of them.

And here’s why I’ll probably never see a struct linker:

  • If the language uses a native linker, a small change must be done to that linker in order to handle encodings of load/store instructions in ways it previously didn’t (currently it only has to deal with resolving pointers, not offsets). Since it’s platform-specific, the small change is actually quite large.
  • You could compromise and avoid that change by generating less efficient code which uses the already available linker ability to resolve the “address” of the viagra object in the zero-based .dstruct.SpamCues section - the code can add that “address” (offset, really) to &cues. But that could still force changes in the compiler back-end because now it has to generate assembly code adding what looks like 2 addresses, which makes no sense today and might be unsupported if the back-end preserves type information.
  • The previous items assume that the portable “front-end” work to support something like dstruct isn’t a big deal. However, I’d guess that not enough people would benefit from it/realize they’d benefit from it to make it appear in a mainstream language and its front-ends.
  • I could roll my own compiler to a language similar to a mainstream one, with a bunch of additions like this struct linker thingie. Two problems with this. One - it’s too hard to parse all the crud in a mainstream language (even if it isn’t C++) to make it worth the trouble, unless your compiler does something really grand; a bunch of nice features probably aren’t worth it. Two - most programmers take a losing approach towards their career where they want to put mainstream languages on their resume so that losers at the other end can scan their resumes for those languages; if your code is spelled in a dialect, you’ll scare off the losers forming the backbone of our industry.

It still amazes me how what really matters isn’t what can be done, but what’s already done. It’s easier to change goddamn hardware than it is to change “software infrastructure” like languages, software tools, APIs, protocols and all kinds of that shit. I mean, here we have something that’s possible and even easy to do, and yet completely impractical.

Guess I’ll have to roll my own yet-another-distributed-reflective-registration bullshit. Oh well.

The cardinal programming jokes

I’m depressed. What I’ll do is I’ll tell you the 3 cardinal programming jokes. And if it helps cheer me up, I’ll consider my job well done.

I must warn you about those jokes. Firstly, they are translated from Russian and Hebrew by yours truly, which may cause them to lose some of their charm. Secondly, I’m not sure they came with that much charm to begin with, because my taste in jokes (or otherwise) can be politely characterized as “lowbrow”. In particular, all 3 jokes are based on the sewer/plumber metaphor. I didn’t consciously collect them based on this criterion, it just turns out that I can’t think of a better metaphor for programming.

By the way, I was recently told by a very strong programmer that of all things, he wanted to become a plumber as a kid. ‘Cause it was very interesting to him, the tools, the pipes, how you make the whole thing work. And then he felt he understood enough of it, so he figured he’d become a programmer instead. And now he is, and he has enough (virtual) pipes full of (virtual) shit to keep him curious about how to make it work for the rest of his life. By which I mean to say, hey, it’s not just my bad taste, it is a good metaphor, see?

So, the jokes. Lowbrow, depressing stuff. You have been warned.

Expanding your skill set

A very important thing. You should be learning stuff. Yada yada.

With many things though, people have this strange tendency to avoid knowing them, and instead ask someone else unfortunate enough to already know them. Say, Makefiles. Is it just my experience or do people worldwide pretend to be incapable of dealing with a hairy Makefile, and leave its regularly scheduled tweaking to a small set of knowledgeable victims?

Or debugging of the lowest kind, with race conditions and creative memory corruption. People like to give up on that, as long as someone else can take over. “I just don’t know how to proceed”. Right.

Sometimes I wish I could put this claim to a test. Check if they’d say this at gunpoint. Or, more humanely and therefore much less cheaply, propose them $1M if they do know how to proceed. I bet they’d think a bit harder. If you’re working on AI, specifically on preparing it to the Turing test, don’t forget to teach it this principle, or else it has no chance of passing for a human.

I find that the following describes the double-edged sword that is skill set expansion quite well:

A plumber and his apprentice pay a visit to a manhole requiring their attention. The plumber goes down the manhole, and the apprentice stays above with the toolbox. The plumber asks for wrench #3, and the apprentice puts the wrench into his hand. 2 minutes pass. “Wrench #5!” The apprentice finds the wrench and passes it to the plumber. 5 more minutes. “Wrench #6!” The plumber is given that, takes a couple more minutes and finally comes out.

The next scene should really be a small piece of pantomime, but I’ll have to get by with words alone. Not unexpectedly for this type of joke, the plumber comes out with his arms covered with excrement. He slowly sweeps his right hand over his left arm, then the left hand over the right arm, shakes his hands and reaches for something to wipe them with. And to the apprentice he says:

“Watch and learn, son, or you’ll be passing wrenches for the rest of your life”.

Really, you should learn things. Expand your skill set. Who wants to be passing wrenches?

Layers of abstraction

Abstraction is good. Or should I say legitimate. Or should I say inevitable. I mean, you have to count on something. Something has to work, because you can’t build things on top of nothing.

Except it won’t work. That something you build things on top of won’t work.

What’s that? “Whining”? Yep, definitely. This here is whining.

Whining is good. Or should I say legitimate. Or should I say inevitable. Because if you aren’t allowed to whine about frigging data channels which drop chunks of data and duplicate chunks of data because some fucking hardware subcontructor couldn’t be bothered to implement arbitration for shared data access, if you aren’t allowed to whine about that…

If you aren’t allowed to whine about that, you should be allowed to whine about memory, which flips bits, and zeros bytes, and it does so once per hour for some weird sequence of accesses having nothing to do with the address where data actually changes. Fuck that, OK? Fuck DDR2. Fuck its controllers and the zillions of their configuration parameters.

A plumber climbs out of a manhole, this time without a preamble, and his arms are covered with - guess what? - excrement! A beautiful little girl in a beautiful white dress happens to pass by. The plumber seizes the opportunity and (another piece of pantomime) quickly, but firmly sweeps his hands over the girl’s white dress.

Little girl (appalled): AAAH!!

Plumber (outraged): Oh yeah? I bet you love to take a shit though.

Yep. You love to allocate objects in memory, don’t you? Megabytes of them. And then a board designer decides to wipe his filthy hands with your beautiful white huge software system. Debug that, you perverted memory-addicted individual.

Taking pride in your work

And still, I actually like my work, on a level. Why? It feels inherently cool to design stuff that becomes this bunch of tiny parts, transistors and all, switching hundreds of millions of times a second, and then to write code that manages all the flying circus.

I know people who feel the same about computer vision. People for whom it’s a personal priority to work on computer vision, where they are given images and they look for stuff in them. Who wants to be doing that? Who wants to be responsible for the solution of a problem that can’t even be precisely defined? Me, I wanna be doing hardware.

What do I actually do most of the time though? I eat hexadecimal. I sit near a debugger, and I keep hitting Page Up in a memory view window, to find the beginning of the array that overwrote this piece of data (I guess the element size from the repetitive patterns and such), and along comes a computer vision geek and he says, “damn it, man, you got out of the Matrix!”

Well, I dunno, I find it much easier to guess what buggy code did to my memory than to find out “why” an algorithm thinks this here is a person when in fact it’s a shade of a tree. Because if you look closely at the pixels, the shade kinda looks like a person, but of course we could reject it based on its motion, but of course that would mean we’d approve these reflections over here based on their motion, but, but, but…

What my bogus example is saying is that you have lots and lots of cues but each can work both for you and against you, and now how do you weigh all that, without even a formal spec? I’d rather eat hexadecimal, thank you very much.

And we look at each other, and sincerely think that our jobs are pretty nifty, but the other guy’s job is awful and how can he be doing it. And I suspect that if one looks at this from aside, one might wonder where the actual fun is, because there is actual fun in here, or so all the participants testify. And I think I know the answer.

An airplane lands, and passengers come out. One of them notices a guy underneath the airplane. As you’d guess, the guy is a plumber. The plumber touches some lock, and immediately gets covered by excrement streaming from an opening at the bottom of the plane.

The pantomime cleanup routine follows, and then comes the turn of the dialog.

Passenger (appalled): What on Earth makes you keep this job?

Plumber (proudly): Hey, I’m in the aerospace business!

The aerospace effect happens to different people with different things. With some, it’s “Hey, I’m making real hardware!” With others, it’s “Hey, I’m finding real objects in real images!” It’s a good thing people are different, because so are the currents of excrement, and someone ought to swim in each. We can’t all be passing wrenches.

I love globals, or Google Core Dump

The entire discussion only applies to unsafe languages, the ones that dump core. By which I mean, C. Or C++, if you’re really out of luck.

If it can dump core, it will dump core, unless it decides to silently corrupt its data instead. Trust my experience of working in a multi-processor, multi-threaded, multi-programmer, multi-nightmare environment. Some C++ FQA Lite readers claimed that the fact that I deal with lots of crashes in C++ indicates that I’m a crappy programmer surrounded by similar people. Luckily, you don’t really need to trust my experience, because you can trust Google’s. Do this:

  1. Find a Google office near you.
  2. Visit a Google toilet.
  3. You’ll find a page about software testing, with the subtitle “Debugging sucks. Testing rocks.” Read it.
  4. Recover from the trauma.
  5. Realize that the chances of you being better at eliminating bugs than Google are low.
  6. Read about the AdWords multi-threaded billing server nightmare.
  7. The server was written in C++. The bug couldn’t happen in a safe language. Meditate on it.
  8. Consider yourself enlightened.

This isn’t the reason why this post has “Google core dump” in its title, but hopefully it’s a reason for us to agree that your C/C++ program will crash, too.

I love globals

What happens when we face a core dump? Well, we need the same things you’d expect to look for in any investigation: names and addresses. Names of objects looking at which may explain what happened, their addresses to actually look at them, and type information to sensibly display them.

In C and C++, we have 3 kinds of addresses: stack, heap and global. Let’s see who lives there.

Except the stack is overwritten, because it can be. Don’t count on being able to see the function calls leading to the point of crash, nor the parameters and local variables of those functions. In fact, don’t even count on being able to see the point of crash itself: the program counter, the link register, the frame pointer, all that stuff can contain garbage.

And the heap is overwritten, too, nearly as badly. The typical data structure used by C/C++ allocators (for example, dlmalloc) is a kind of linked list, where each memory block is prefixed with its size so you can jump to the next one. Overwrite one of these size values and you will have lost the boundaries of the chunks above that address. That’s a loss of 50% of the heap objects on average, assuming uniform distribution of memory overwriting bugs across the address space of the heap.

So don’t count on the stack or the heap. Your only hope is that someone has ignored the Best Practices and the finger-pointing by the more proficient colleagues, and allocated a global object. Possibly under the clever disguise of a “Singleton”. Not a bad thing after all, that moronic “design pattern”, because it ultimately allowed to counter cargo cult programmers’ accusations of “globals are evil” with equally powerful cargo cult argument of “it’s a design pattern”. So people could allocate globals again.

Which is good, because a global always has an accurate name-to-address mapping, no matter what atrocity was committed by the bulk of unsafe code running on the loose. Can’t overwrite a symbol table. And it has accurate type information, too. As opposed to objects you find through void*, or a base class pointer where the base class lacks virtual functions or the object vptr was overwritten, etc.

Which is why I frequently start debugging by firing an object view window on a global, or running debugger macros which read globals, etc. Of course you can fuck up a global variable to make debugging unpleasant. For example, if the variable is “static” in the C sense, you need to open the right file or function to display it, and you need the debugger front-end to understand the context, which will be especially challenging if it’s a static variable in a template function (one of the best things in C++ is how neatly its new features interact with C’s old ones).

Or you can stuff the global into a class or a namespace. I was never able to display globals by their qualified C++ name in, say, gdb 5. But no matter; nm <program> | grep <global> followed by p *(TypeOfGlobal*)addr always does the trick, and no attempts at obfuscating the symbol table will stop it. I still say make it a real, unashamed global to make debugging easier. If you’re lucky, you’ll get to piss off a couple of cargo cult followers as a nice side-effect.

Google Core Dump

A core dump is a web. Its sites are objects. It’s hyperlinks are pointers. It’s PageRank is a TypeRank: what’s the type of this object according to the votes of the pointers stored in other objects? The spamdexing is done by pointer-like bit patterns stored in unused memory slots. The global variables are the major sites with high availability you can use as roots for the crawling.

What utilities would we like to have for this web? The usual stuff.

  • Browsers. Debugger object view window is the Firefox, and the memory view window is the Lynx. The core dump Lynx usually sucks in that it doesn’t make it easy to follow pointers - can’t click on a word and have the browser follow the pointer (by jumping to the memory pointed by it). No back button, either. Oh well.
  • DNS. The ability to translate variable names to raw addresses. Works very reliably for globals and passably otherwise. Works reliably for all objects in safe languages.
  • Reverse DNS. Given an address, tell me the object name. Problematic for dynamically allocated objects, although you could list the names of pointer variables leading to it (Google bombing). Works reliably for global functions and variables. For some reason, the standard addr2line program only supports functions though. Which is why I have an addr2sym program. It so happened that I have several of them, in fact. You can download one here. “Reverse DNS” is particularly useful when you find pointers somewhere in registers or memory and wonder what they could point to. In safe languages, you don’t have that problem because everything is typed and so you can simply display the pointed object.
  • Google Core Dump, similar to Google Desktop or Google for the WWW. Crawl a core dump, figure out the object boundaries and types by parsing the heap linked list and the stack and looking at pointers’ “votes”, create an index, and allow me to query that index. Lots of work, that, some of it heuristical. And in order to get type information in C or C++, you’ll have to either parse the source code (good luck doing it with C++), or parse the non-portable debug information format. But it’s doable; in fact, we have it, for our particular target/debugger/allocator combo. Of course it has its glitches. Quirky and obscure enough to make open sourcing it not worth the trouble.

I really wish there was a reasonably portable and reliable Google Core Dump kind of thing. But it doesn’t look like that many people care about debugging crashes at all. Most core dumps at customer sites seem to go to /dev/null, and those that can’t be easily deciphered are apparently given up on until the bug manifests itself in some other way or its cause is guessed by someone.

Am I coming from a particularly weird niche where the code size is large enough and the development rapid enough to make crashes almost unavoidable, but crashes in the final product version are almost intolerable? Or do most good projects allocate everything on the stack and the heap, so with those smashed they’re doomed no matter what? Or is the problem simply stinky enough to make it unattractive for a hobby project while lacking revenue potential to make a good commercial project?

Would you like this sort of thing? If you would, drop me a line. In the meanwhile, I satisfy my wish for a Google Core Dump with my perfect implementation for an embedded co-processor, the one I’ve poked at with Tcl commands. With 128K of memory, no dynamic allocation, and local variables effectively implemented as globals, perfect decoding is easy. I’m telling ya, globals rule.

As to my “reverse DNS” implementation:

  • I could make it more portable by parsing the output of nm --print-size. But just running nm on a 20M symbol table takes about 2 seconds. I want instantaneous output, ’cause I’m very impatient when I debug.
  • Alternatively, I could make it more portable by using a library such as bfd. But that would drag in a library such as bfd, and I had trouble with what looked like library/compiler version mismatches with bfd, whereas my ELF parsing code never had any trouble. Also, an implementation parsing ELF is more interesting as sample code because you get to see how easy to parse these formats are. So it’s elfaddr2sym, not addr2sym. (It’s really 32-bit-ELF-with-host-endianness-addr2sym, because I’m lazy and it covers all my targets.)
  • There’s a ton of addr2sym code out there, and maybe a good addr2sym program. I just didn’t find it. I have an acknowledged weakness in the wheel reinventing department.
  • Of course I don’t demangle the ugly C++ names; piping to c++filt does.
  • The program is in D, because of the “instantaneous” bit, and because D is one of the best choices available today if you care about both speed and brevity. Look at this: lowerBound!("a.st_value <= b")(ssyms, addr) does a binary search for addr in the sorted ssyms array. As brief as it gets out of the box with any language and standard library, isn’t it? The string is compiled statically into the instantiation of the lowerBound template; a & b are the arguments of the anonymous function represented by the string. Readable. Short. Fast. Easy to use - garbage-collected array outputs in functions like filter(), error messages to the point - that’s why a decent grammar is a good thing even if you aren’t the compiler writer. Looks a lot like C++, braces, static typing, everything. Thus easy to pimp in a 3GL environment, in particular, a C++ environment. You can download the Digital Mars D compiler for Linux, or wait for C++0x to solve 15% of the problems with <algorithm> by introducing worse problems.

By the way, the std.algorithm module, the one with the sort, filter, lowerBound and similar functions, is by Andrei Alexandrescu, of Modern C++ Design fame. How is it possible that his stuff in D is so yummy while his implementation of similar things in C++ is equally icky? Because C++ is to D what proper fixation is to anaesthesia. There, I bet you saw it coming.

What does “global” mean?

For the sake of completeness, I’d like to bore you with a discussion of the various aspects of globalhood, in the vanishingly small hope of this being useful in a battle against a cargo cult follower authoring a coding convention or such. In C++, “global” can mean at least 6 things:

  • Number of instances per process. A “global” is everything that’s instantiated once.
  • Life cycle. A “global” is constructed before main and destroyed after main. A static variable inside a function is not “global” in this sense.
  • “Scope” in the “namespace” sense (as opposed to the life cycle sense). We have C-style file scope, class scope, function scope, and “the true global scope”. And we have namespaces.
  • Storage. A “global” is assigned a link time address and stored there. In a singleton implementation calling new and assigning its output to a global pointer, the pointer is “global” in this sense but the object is not.
  • Access control. If it’s in a class scope, it may be private or protected, which makes it less of a global in this fifth sense.
  • Responsibility. A global can be accessible from everywhere but only actually accessed from a couple of places. For example, you can allocate a large object instantiating lots of members near your main function and then call object methods which aren’t aware that the stuff is allocated globally.

So when I share my love of globals with you, the question is which aspect of globality I mean. What I mean is this:

  1. I like global storage - link-time addresses - for everything which can be handled that way. A global pointer is better than nothing, but it can be overwritten and you will have lost the object; better allocate the entire thing globally.
  2. I like global scope, no classes, namespaces and access control keywords attached, to make symbol table look-up easier, thus making use of the global allocation.
  3. I like global life cycle - no Meyers’ singletons and lazy initialization. In fact, I like trivial constructors/destructors, leaving the actual work to init/close functions called by main(). This way, you can actually control the order in which things are done and know what the dependencies are. With Meyers’ singletons, the order of destruction is uncontrollable (it’s the reverse order of initialization, which doesn’t necessarily work). Solutions were proposed to this problem, so dreadful that I’m not going to discuss them. Just grow up, design the damned init/close sequence and be in control again. Why do people think that all major operations should be explicit except for initialization which should happen automagically when you least expect it?
  4. “Globals” in the sense of “touched by every piece of code” is the trademark style of a filthy swine. There are plenty of good reasons to use “globals”; none of them has anything to do with “globals” as in “variables nobody/everybody is responsible for”.
  5. I think that everything that’s instantiated once per process is a “global”, and when you wrap it with scope, access control, and design patterns, you shouldn’t stop calling it a global (and instead insist on “singleton”, “static class member”, etc.). It’s still a global, and its wrapping should be evaluated by its practical virtues. Currently, I see no point in wrapping globals in anything - plain old global variables are the thing best supported by all software tools I know.

I think this can be used as “rationale” in a coding guideline, maybe in the part allowing the use of globals as an “exception”. But I keep my hopes low.

Ahem

To make an embarrassing story short:

  1. The merge scenario from the previous post doesn’t work the way I said it works in any DVCS it was tried on. I talked about the case of “merging merges” - when two people resolve the same conflict independently in different clones, and then someone pulls from both clones. More specifically, I mentioned the case where the conflict was between two similar patches, and each of the two people took a different patch when resolving the conflict. I claimed BitKeeper would then remove both patches in the final merge. Well, I’m still sure it did work that way for me once (an older version of bk?.. some specifics I didn’t notice?..) But bk 4 works differently; it seems to take the later conflict resolution (throwing away one of the patches in my scenario, but not both). Mercurial reportedly takes the earlier resolution, throwing away the other patch. Git and Bazaar reportedly require user intervention, possibly preventing damage by automerges at the cost of, well, requiring user intervention. bk and hg do manage to create a working version in my specific scenario, but of course throwing away one of the conflict resolutions isn’t always safe that way, it’s only safe in my scenario where both resolutions are basically equivalent. Anyway, while “merging merges” is specific to DVCS, no contemporary one seems to screw you nearly as badly as I described; “most vexing merge”, oh, c’mon. Also, it would be easy to guess that they all deal with this scenario differently, because this whole business of merging is heuristical, what are the chances for different heuristics to do the same thing? And in general it’s awfully lame to publish stuff first and check it later. I suck.
  2. And the overall excited mood of that article sucked, too, ’cause, like, c’mon, everybody knows that automerges can cut your fingers off, big deal, calm down. I mean, the worst merge-related bugs I dealt with came from automerges that any kind of version control system would allow. Two changes done in different files, that kind of thing. Coding in a “merge-friendly” way is something few people do, and it isn’t that easy. For example, you basically must never change semantics of definitions. If your function didn’t lock that semaphore, and now it does, then a call added in another branch, which was completely safe, can now cause a deadlock. So what are you going to do, modify the function name each time you change its “observable semantics”? Is everybody really that anal-retentive about it? I doubt that. But we all live with automerges because it’s cheaper to deal with their occasional damage than with the constant damage of manual merges, which take lots of time and are intolerably boring, thus very error prone. Which is why I prefer distributed systems and their better ability to merge long-living branches due to detailed recording of change history, even though long-living branches are extremely harmful. Harmful as they are, they will occasionally flourish, and then you need strong automerge, not manual merge, to end their evil lives. But anyway, who cares about the preferences of a person who doesn’t even bother to check his own trivially testable claims?

Blech.

DVCS and its most vexing merge

There’s this very elegant way to shoot your leg off with a DVCS. Here’s the recipee:

  1. I create a clone of the repository A and call it B. You create another clone of A and call it C.
  2. I add an if statement in B, fixing a bug. You fix the same bug in C in a very similar way.
  3. I pull your changes from C to B. There’s a conflict around the bug since we both fixed it. I take my patch, throwing away your nearly identical patch.
  4. Meanwhile, your roommate pulled both B and C into his clone, D. And he had to resolve that conflict, too. And he took your patch, and threw mine away.
  5. Now, D and B are pushed back to A. DVCS quiz: what happens?

Answer: the system has accurately recorded the manual merges, so it has enough information to make this new merge automatically. It sees your patch, and it throws it away as it should according to my manual merge. It sees my patch, and it flushes it down the toilet since that’s what your roommate said. Net result: both patches are gone, the bug is back in business. (Edit: it doesn’t work that way - bk version 4 does a different thing, and other systems reportedly do still other things. Do you still want to read the rest of this?..)

Which of course doesn’t matter, since it’s immediately discovered by the massive Automated Test Suite. For example, if it’s an OS kernel, each revision is automatically tested on all hardware configurations it should run on. And the whole process only takes 10 minutes, according to the Ten Minute Build XP Practice. No harm done, no reason to discuss it. I just thought it was a curious thing worth sharing.

Maybe it’s a well-known thing, but I don’t think it is, and if I’m right, it’s definitely lovable. For example, here’s what BitMover, maker of BitKeeper, the common ancestor of DVCSes, has to say about this:

“It’s important to note that because BitKeeper has a star topology and its possible to share data with any repository, it’s not necessarily recommended.”

What this is trying to say is that the graph of pulls shouldn’t be a generic graph and you’re better off with a tree. That is, I shouldn’t pull directly from you; we should both pull from and push to A. You and your roommate should also synchronize via A, or via A’s “child” repository, but then you shouldn’t push to A directly, only via that child, and so on. If we maintain this tree structure, the same conflict will never be resolved twice, and then we won’t get screwed when the merges are merged.

I wonder if you could detect the situations when you “merge merges”, that is, when the same conflict was resolved differently. You could then insist on human intervention and save those humans’ bacon. I’m too lazy to think this out and too stupid to effortlessly see the answer, so I’ll resort to a social heuristic like all of us uber-formal nerds do. The social heuristic is that Larry McVoy of BitMover has probably already thought about this, and found ways in which it could fail. So I’m not going to argue that BitKeeper merges are broken.

What I’m going to argue, at least for the next couple of paragraphs, is that it sucks when they tell you about their superstar topology and then explain that it’s best to avoid actually using it. Not only that, but they fail to mention a fairly frightening and, trust me, not at all unlikely scenario which could actually persuade you to follow their advice.

Because when they tell me “we have this simple model of how things work - repositories with complete local history and changes flowing between them - but you should arbitrarily restrict yourself to a subset of this model, for reasons we aren’t going to share with you, even though the general case works”, when they tell me that, my reply is “I alias rm to rm -f”. I understand how rm works, it’s fairly simple, and I don’t like to talk about it over and over again, “Are you sure?” - yes, I’m sure, thank you very much and good bye.

But the lovable part is, speaking of social heuristics, the lovable part is that BitMover said it right. Because if they mentioned that fairly frightening and not at all unlikely scenario, they’d scare people off rather than illustrate a point. On the other hand, when they say “It’s good practice to think about how the data should flow”, most people will nod and follow whatever advice they give them.

Just imagine a team of programmers engaging in the practice of thinking about how the data should flow, dammit, all on company time. Yeah, yeah, so BitKeeper earned a sarcastic comment on Proper Fixation. It’s still a small price to pay for getting your message to the majority of your users.

You see, the majority of programmers are not just “irrational” as we all are, but their reliance on reasoning doesn’t even exceed the mean of the population, which means they barely use reasoning at all, it’s pure gut feeling.

For example, I was writing a bunch of macros in a proprietary debugger extension language. A guy who came to talk to me looked over my shoulder, and I explained, “Debugger macros. Very useful, a crappy language though.” He said, “Yeah, looks like so.”

HE COULDN’T POSSIBLY KNOW THAT. I knew he couldn’t. How could he look at the code and realize that all variables were global? How could he know they were printed upon assignment, including loop counters (’cause it’s a “macro”, so it works just like assigning at the debugger console, which prints the variable)? He couldn’t know any of that. So why did he agree with the “crappy” part? Oh, I knew why.

“You mean it has dollar signs?” Silence. “You mean it prefixes variable names with the dollar sign, don’t you?” “Yeah, that.” “Well, I like the dollar signs, helps you distinguish between your macro variables and the variables of the debugged C program. Anything else?” “Well, um, yeah, it looks kinda primitive.” Low-end Ignorant Language Bigotry quiz: if “crappy” means “has dollar signs”, what does “primitive” mean? Answer: no type declarations. I’m sure it was that, although I didn’t go on and ask.

So that’s “engineers” for you. If you want to write programs or tech docs for the average engineer, keep that picture in mind. Or don’t. Aim for the minority, for people who don’t work that way, under the assumption that they are the ones that actually matter the most. I don’t know if this assumption is right, but at least it’s lofty.

Why DVCS?

For the record, I had my share of both centralized and distributed version control systems, and today I like it distributed and I wouldn’t ever want to go back, The Most Vexing Merge be damned. Why? I’ll share my reasons with you through the story of My Last CVS To BitKeeper Exodus. I think I’ll illustrate the engineers-and-reasoning-don’t-mix point as a by-product, because that’s how that story goes.

There recently was this argument about DVCS encouraging “code bombs”, a.k.a “crawling into a cave”. I haven’t heard either of these terms, so I’ve been using a term of my own - “accumulating critical mass”. The idea is to develop in your own corner, without integrating it with the main branch. You then show up with N KLOC of new stuff and kindly ask to merge it in.

Some people claimed this was particularly harmful for open source projects where there was no managerial authority to prevent this. Ha! Double ha. In an open source project, the key maintainers may say, “you see, we can’t integrate it when it’s done this way; we’re sorry, but you should have talked to us.” The changes will then be reimplemented or dropped on the floor.

Now, if you think that in a commercial environment a manager can easily decide to drop changes on the floor, together with the cost of implementing them and especially the cost of delaying the delivery of the features, if you think that, well, I wonder why you do. But perhaps a manager could insist on frequent integration? She could try, but she’d have to deal with real or imaginary cost of merges, increasing over time and getting in the way of deliveries. Of course there are perfect managers and perfect teams where it’s all dealt with appropriately, you just have to find them.

So yeah, “code bombing” is a problem, especially in commercial projects. But the idea that DVCS encourages it? Hilarity! It’s like saying that guns encourage murder. I prefer to think of guns as something that encourages fear of armed policemen, getting in the way of the natural instinct to club your neighbors to death. I mean, yeah, it’s easier to code bomb with a DVCS, but with a centralized system, people use code bombing - or clubbing? - even more, because merging is harder, the cost of merges increases more quickly and the ability to force integration is thus lower. The criminals are poorly equipped, but so is the police.

And this is exactly what happened to the last team stuck with CVS that I helped migrate to BitKeeper. Everybody had their own version made up of file snapshots taken at different times and merged with the repository version to different extents. A centralized system doesn’t record these local versions, so unless you immediately commit your merges, you are left with a version of a file which the system doesn’t know. This means that the next merges are going to be really hard, because you’ve lost your GCA, the greatest common ancestor. So instead of a 3-way merge, you’ll be doing a 2-way merge, which really sucks.

So I decided to not talk about the caves they were crawling into and the code bombs they were throwing at each other. Rather, I decided to show them how a 2-way merge couldn’t do what a 3-way merge could. I still think it’s the ultimate argument for DVCS, because DVCS is basically about accurate recording of all versions and not just the single time line of the main trunk. So the argument for it has to do with the benefits of such detailed recording.

So I gave this example where you start with a file having two lines:
aaa
bbb

And then I add a line, ccc, and you delete a line, aaa. If we have the GCA (a 3-way merge), then clearly the right result is deleting aaa and adding ccc, getting this:
bbb
ccc

But with a 2-way merge, we only have your file:
bbb

…and my file:
aaa
bbb
ccc

This can only be merged manually, because there’s no way to automatically figure out that you deleted aaa and I added ccc; for all the tool knows, you could have done nothing and I’ve added two lines, so the right merge is:
aaa
bbb
ccc

…canceling your change. So it has to be manual merge. Manual merge means dozens of boring deltas you have to inspect in each file. That’s what I call “costly”.

Of course it doesn’t matter in a right world, where people integrate frequently and always commit their merged files to the centralized repository. Except it wasn’t so in the wrong world of the CVS developers I was “helping” to upgrade to new tools (for the last time in my life, people, for the last time in my life). And I thought we could avoid the discussion of the somewhat-technical-but-largely-social reasons of the constantly increasing cost of merges, and instead we could focus on the technical benefits of the 3-way merge and accurate GCA recording.

And of course I was wrong. The discussion immediately shifted to “we don’t need merges” because everything is “modular” and there’s a single owner to each file. Of course it wasn’t, and there wasn’t. Some things were used by everybody, like the awful build scripts and the DMA code. Some modules had two owners, or were in a transition state and had 1.5 owners, and so on. There were merges all over the place.

And if there weren’t merges and merge-related problems, how come everybody worked on their own “pirate” version which was never recorded in the main trunk and was made from a colorful variety of files partially merged at different times? How come changes propagated with cp and emacs-diff and not cvs update? And why was the tech lead so passionate about moving to BitKeeper which doesn’t let you partially update a repository so you have to merge everything? And why did everybody anxiously object that necessity if there were “no problems with merges”?

The final result: the tech lead simply forced the migration to bk. Everybody on the team hated me for my connection with the idea (I wasn’t on their team but I used to be a likable satellite and now became a hateful satellite). Developers who I thought were their best eventually (and I mean eventually) told me it was actually a good thing. So it wasn’t a bad closure. And still, I decided that I’m not going to “help” anybody “deploy” any kind of “tool” in this lifetime again, roughly speaking. Too much emotions for this programmer.

And this was supposed to show why I like DVCS, at least in the imperfect world where long-living branches occasionally happen, and the kind of reasoning I think is interesting in this context, and the kind of reasoning other people I came across found interesting. So there were are.

P.S. Why “most vexing”?

I thought I saw that “C++’s most vexing parse” from Scott Meyers’ Effective STL has its own Wikipedia entry, but apparently it doesn’t. It’s basically a variation on the theme of C++’s declaration/definition ambiguity, and I liked the term, especially the “most” part where parses are unambiguously sorted along the vexing dimension. So I figured “X’s most vexing Y” is a good template.

I’d like to use this opportunity to say that I skimmed though Effective C++, 3rd Edition, and… Where do I start? There’s an advice to create date objects with “Date d(Day(31), Month::april(), Year(2000))” or something. That is, create special types for the constructor arguments. Well, it doesn’t check that April comes without the 31th day, does it? The Date constructor could test for it though. Well, why not test for April 41st in the Date constructor, too, and, ya know, spare the users some keystrokes, if you see what I mean? The code is verbose. C++ compiler error messages are verbose. VERBOSITY EVERYWHERE! Help!

This raises the question to the author, whether he ever worked with a system where every piece of data comes covered with the toxic waste of overzealous static typing. But this borders on an ad hominem attack. And seriously, that sort of thing is to be avoided, at least until somebody proposes to have named constants for days or years and not just months.

So instead of the personal attack, I’ll ask Software Development Consultants, all of them, to kindly change the phrasing “it’s best to do so and so” to “I like so and so” or something. Because we have this huge crappy-dollar-sign crowd, and they copy style from each other like crazy, and their ultimate source of style is books by Software Development Consultants, and whenever they see a “best practice”, their common sense is turned off and they add the technique to the bag of tricks. So Consultants have a great power in this world. It doesn’t make the common sense shut-off feature their fault, but power they do have.

And with great power comes great responsibility, profanity deleted. I mean, you’re obviously giving advice neither you nor others have tested for very long, out of generic principles, profanity deleted. Like “prefer algorithms such as for_each to loops”, an advice issued before fully compliant implementations of STL were even widely available, profanity deleted. Quite a piece of advice, that, profanity deleted. Couldn’t you at least phrase your advices in a little less self-assured way, fucking profanity deleted?

For example, Meyers has finally lowered the bridge and let the enemy of template metaprogramming occupy a notable share of pages in an Effective C++ edition. I still remember his promise to “never write about templates”, in the preface to Modern C++ Design, I think. And now the promise is broken. Hordes of clueless weenies are rushing into the minefield of template metaprogramming as we speak, since it’s now officially Mainstream C++. Can you imagine the consequences? I can’t. It’s too awful. I think I’ll go to sleep now.

Extreme Programming Explained

When I saw Kent Beck’s “Extreme Programming Explained” in our office, I was shocked. I’ve already accepted the inevitable occasional purchasing of obscure C++ wisdom, exemplified by titles such as “Effective C++”, “Exceptional C++”, “Imperfect C++”, “Modern C++ Design” and so on. Yikes. Oh well. As long as nobody uses the boost libraries in production code, they can entertain themselves by ordering whatever they want as far as I’m concerned.

But XP? A software development methodology? A programmer ordered a book on methodology? I have to find out who that was. Well, I found out and the shock became worse, because it was one of the cooler kids. Largely out of respect for that guy, I grabbed the book, took it home and read it. What follows is my book review. Spoiler: I say quite some positive things there.

XP and hire/fire dynamics

I didn’t look for quotes on this subject, it’s just that all of them caught my eye. Maybe it’s because HR implications of things are inherently interesting, maybe it’s because everything else in a “methodology” is life-threateningly boring, and maybe both. Anyway, the quotes are:

“Given the choice between an extremely skilled loner and a competent-but-social programmer, XP teams consistently choose the more social candidate.”

Gotta love the “but”. Well, what can I say about this approach? This approach would prevent the best pieces of software and hardware I’ve seen developed from happening. In all those cases, there was one or more “extremely skilled loners”. More on that later, when we get to code ownership, open spaces, pair programming and the like.

For now, it’s sufficient to say that a methodology preferring mediocre to “extremely skilled” and calling itself “Extreme Programming” is, um, interesting. That “competent-but-social” is a euphemism for “mediocre” is as clear to me as the fact that mediocrity, while related to lack of talent, is first and foremost a personal value.

“Here’s a sad but repeated story: a development team begins applying XP, dramatically improves quality and productivity, but then is disbanded, its leaders fired and the rest of the team scattered. Why does this happen? … The team’s improved performance shifted the constraint elsewhere in the organization. The new constraint (e.g. marketing, who can’t decide what they want fast enough) doesn’t like the spotlight.”

I have a fairly developed imagination, especially when it comes to organizational dysfunctions. For example, I can imagine how a team delivering a good result without working overtime will be appreciated less than a team delivering a bad result through heroic efforts (this example is from the book, too; of course the unappreciated team is the one “practicing” XP).

Getting fired over productivity? Because marketing can’t invent features fast enough? What??

Now, that certainly challenges one’s imagination skills. But we aren’t the kind of people to turn down a challenge, are we? OK, lez do it. Imagine. Imagine an organization so rotten, so disgusting, so peculiarly brain-damaged that it can fire people because they work too fast to invent new work for them. Hooray! We did it! I see it right in front of my closed eyes!

But wait, what’s that? Are those unfortunate victims of corporate idiocy Extreme Programmers? The competent-but-social programmers? If they’re so “social”, can’t they see what stinky a hole they’ve landed at? You don’t get fired from such places; you quit. I mean, realizing that you’re in a hopeless human quagmire is among the most basic social skills - even I have it.

I failed the challenge. I can’t imagine this picture.

Brad Jensen, Senior VP, Sabre Airline Solutions: “If programmers won’t pair or if they insist on owning code, have the courage to fire them. The rest of the team will bail you ought.”

Aha. XP, the two-edged sward. Get fired for using XP or get fired for refusing to use it. Just ducky.

No, really. I don’t want to be mean. I won’t even pick on the VP’s “courage to fire” wording. You know what - I’ll even praise this wording. The employer depends on a strong programmer more than the programmer depends on them, and firing an evil, antisocial strong programmer is likely the most courageous act a manager can commit.

All I’m saying is, XP is basically this religious cult, as more quotes will show. This cult is spreading in your organization. They fire people. They easily fire exceedingly productive people, “skilled loners” that won’t “pair” and insist on code ownership - nothing evil or antisocial. They fire people just for refusing to accept their most controversial “practices”.

Could it be the real reason for the occasional extermination of productive XP teams, assuming Kent Beck is right and the disbanded XP teams he saw actually were productive? With all those “loners” I work with and the kind of “process” we use, I can easily imagine my reaction to a hypothetical XP epidemic at my workplace. Probably something along the lines of “Come and get me! Let’s see who gets fired first.”

XP “practices”

Why do I place the word “practice” in quotes? I dunno, do you know a better way to typographically mark stupidity? The word kinda reminds of “spiritual practices”, the religion thing again. But primarily it reminds me that there’s just too much money in this industry. I mean, I have a relative who makes and sells ceramics. Neither she nor her employees use any “best practices”. There’s not enough money in ceramics for “practices”; you have to work. No time to listen to people talking about work, hence no market for their talks and writings.

This term is annoying on so many levels that I could go on for ages. For example, the words “best practices” and “questionable practices” convey infinite amount of ignorant, arrogant idiocy. You could never pass an exam citing “best practices” - you are supposed to analyze the problem and prove your solution. But in the industry it’s perfectly possible to get away with, and even get promoted by substituting analysis with rules of thumb, irrelevant to the case in point as they may be.

OK, enough said about the term “practices”; maybe it’s just me. Let’s talk about some practices.

Sit Together

XP likes open space. I don’t like open space. I’m exceptionally good at ignoring external events when I concentrate, to the point where it takes me 10 seconds to understand what you’re saying when you interrupt me in the middle of something. But there’s a limit to this autism, and when lots of conversations I can relate to, intellectually and emotionally, happen around me, I can’t work.

Many people have talked about the state of “flow” and how you don’t want to interrupt someone when they’re in it because they’re extremely productive, and how interrupting someone all the time means they work at a fraction of their full speed. Myself, I “value” communication as this would be called in XP. I like to talk to people, and people like to talk to me. Lots of them. And much as like it, I like flow, too, and closing the door in my room is quite a Best Practice.

To be fair, with XP, you won’t be able to concentrate behind a closed door, either, because of…

Pair Programming

We’ll start with some quotes:

“Personal hygiene and health are important issues when pairing. Cover your mouth when you cough. Avoid strong colognes that might affect your partner.”

“When programmers aren’t emotionally mature enough to separate approval from arousal, working with a person of the opposite gender can bring up sexual feelings that are not in the best interest of the team.”

“In Figure 6 the man has moved closer to the woman than is comfortable for her. Neither is making his or her best technical decision at this point.

“Programmers that won’t pair.” Sounds more like “programmers that won’t couple.”

“I like to program with someone new every couple of hours…”

OK, OK, this is neither emotionally mature nor in the best interest of the team. Enough with this sort of quotes.

XP measures time in “pair-hours”, since all production code must be written in pairs. Since we programmers are good at math, it’s easy to see that this halves your task force. Kent Beck claims that pairs are more than 2x more productive than 2 people working alone, so it’s a net gain. Is that really so, even if each of the 2 people can close the door?

I worked in teams of size 2 or 3, with everybody going pretty much at full speed, and I can’t imagine gaining back the 2x spent on sitting together. Maybe if you’re doing something so extraordinarily boring and trivial that you can barely move without talking to someone about it.

No, really, I think XP is for writing huge piles of straightforward code. And here’s my proof.

Shared Code

As we’ve seen, in XP, there’s no code ownership unless you want to get fired. The code is owned by the team, and everybody can and is encouraged to change (”refactor”) any piece of code, any time.

This concludes the triad of XP practices that I’ll call “The Coupling Practices”, both because of my emotionally immature amusement with the quotes above and because of the coupling between everything (it’s one system owned by everyone, not separate modules with separate owners). The Coupling Practices are:

  • Open space
  • Work in pairs
  • No code ownership

Kent Beck raises just one objection to the shared code practice - people might act irresponsibly and make expedient changes. Well, paranoid attitude towards coworkers is not one of my many sins; if you trust someone to work for your company, then you trust them to act responsibly most of the time, don’t you think? My objections are different.

First, the coupling. I’m a great believer in Conway’s Law: Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations. I’m the great believer that I am simply because I’ve never seen this law fail in practice. Strong module boundaries mostly appear at social boundaries - code ownership and responsibility - or at technical boundaries (different programming languages, different processes, different machines, kernel space/user space).

If everybody owns all modules, and unless each module is a separate kernel module, they’ll quite likely become one big hairy program. And I don’t like coupling, at least not in this sense. Sure, XP encourages refactoring, which should take care of the problem; I just never saw anything beat Conway’s Law. To me, it’s like defeating the law of gravity with spiritual practices.

My second objection is that shared ownership can’t work except for trivial code, and this is the proof of the “XP is for trivial stuff” claim I’ve made above. Let me start with an example.

I used to think that ASIC engineers, the kind that write hardware description code in Verilog or VHDL, are sociopaths. I mean, the ones I’ve met were very helpful and patient when it came to explaining things to my feeble programmer mind, but apparently they couldn’t stand each other. If one of them even touched another’s code, the other guy would go to pieces. As a programmer, I was used to the Shared Code practice (you don’t need XP to have a happy tar pit of coders friendly patching each other’s stuff). So these hardware types scared me off.

And then I began to understand what you were saying in Verilog. In hardware, you have a bunch of variables, and you have code that computes the next values of variables given their current values. Hardware is like a huge recursive function of its registers’ state, with the reset logic being the base of the recursion. Now, it’s trivial to understand how a variable is updated - it’s spelled using the usual arithmetics and logic and if-then-else and stuff. But understanding how all these updates propagate, and what will happen in, say, 4 cycles, and how the pipelines work together is, I dunno, impossible.

So you see, when you have this mental model of how this monstrous state machine works, and someone makes a change to it, you tend to be upset, because it’s hard to read someone else’s change and update your mental model. And why are they so sure their change makes sense in the first place? They can’t possibly understand everything there is to understand! You could say that it’s all irrelevant since XP is about software, and this is hardware, but to me, the line is quite blurred. It’s code in a programming language. You can run this code on a simulator or an FPGA without manufacturing anything, just by copying bits. It’s software.

Now, obviously we have the other extreme - code so straightforward that you can read it from top to bottom and understand it completely. Say, a one-screen script shoveling through the file system, doing simple regexpy parsing and printing statistics. You know, Practical Extraction and Report. They even have a Language for it. The question is, where is most software - is it like a Verilog module or like a screenful of Perl? (BTW, my Verilog and Perl guru is the same person.)

Well, let’s see. Low-level interrupt handling and scheduling code is definitely like Verilog - touch my code and I’ll go to pieces. Compilers are also like Verilog, because you have complicated algorithms and you can’t just read them and say “aha, I see why this is doing what it’s supposed to be doing”. There are lots of heuristics and lots of knowledge about interaction of passes and complicated data structures. Optimized image processing code is also like Verilog, because there are all the precision considerations and knowledge about the target optimizer and platform. Computer vision code is also like Verilog… Everything I currently deal with is like that.

I mean, code is not like descriptive prose you’d put in a local wiki. You can’t read it, say “I get it!”, update it, and make the author happy with the “refactoring”. Code is more like poetry: change this line, and now the next line doesn’t rhyme, or you’ve broken the rhythm, or you’ve put angry words into a happy poem, that sort of trouble. Which is one reason to like code ownership.

It’s probably useful to have a second programmer with “read access” to important pieces of code, so that at least 2 people can help debug each piece. You can get there with code reviews, and without the Coupling Practices, which I can’t imagine working except for code so straightforward that I doubt there’s much of its kind.

XP and CMM

Brad Jensen, Senior VP, Sabre Airline Solutions: “The pure XP projects have very few defects… (Even) the (impure) XP projects have very competitive defect rates, one to two defects per thousand lines of code. The Bangalore SPIN, consisting of ten CMM level-five organizations, reports an average of 8 defects per thousand lines of code.”

“Defects per LOC”. Interesting metric. How do we improve it? First, we don’t record “defects”. What’s a “defect”? Are 10 reported problems caused by one defect or several? Depends on the “root cause analysis”, which is of course up to us coders. The second important thing to do is to write more code. That guy working on the data-driven layout rendering should be taken out and shot. We have editing macros for that. You could squeeze 50K LOC out of rendering if it was spelled as code like it should be.

Who is stupid enough to use a metric encouraging people to misattribute problem reports, increase code size, or simply quit the silly job? Why, it’s the Capability Maturity Model, of course. Meet XP’s competitor: the dreadful CMM.

I could have received a certificate telling that I was trained in CMM, but I couldn’t take it and asked my manager to bail me out (and so he did, THANKS!!). It was stupid to the point of physical pain. The CMM instructor, making $160/hour, talked like this: “Blah-blah-blah-PRACTICE!! Blah-blah-blah-ESTABLISH-AND-MAINTAIN!! Blah-blah-blah-GOAL!!” - yelling at some random word, so you couldn’t even fall asleep in the absence of any kind of rhythm your brain could learn to ignore. The sleep deprivation and the countless repetitions caused some of the garbage to be ingrained in my memory. Here’s an excerpt for ya.

CMM has 5 levels. CMM level 1 is where you operate right now: it denotes the ability to ship something. CMM level 5, the one mentioned in the interview from Kent Beck’s book, denotes complete paralysis. To do anything, you have to write or update so many documents, have so many meetings with “relevant stakeholders”, and to perform so many pointless measurements of the defects/LOC kind, that it’s much more productive to just go postal and at least remove a CMM auditor or two from the face of the Earth in the process.

Levels 2 to 4 indicate various intermediate stages where paralysis is spreading, but you can still ship. For example, level 2 is called “Managed Process”. “Managed Process is distinguished by the degree to which the process is Managed”. I’m not making this up. There’s a book called Capability Maturity Model Integration, and this book, heavy enough to kill a human, is full of this sort of stuff. Reading it is impossible.

And this is why I think XP is a great thing. (See? I promised I’ll say positive things; and I’m not done yet!) I’ve actually read Extreme Programming Explained. About 75% of it seemed meaningless, but I made it through. The CMMI book, on the other hand, is pretty much infeasible.

CMM has hundreds of practices. XP has a couple dozens. CMM has extremely costly certification process. XP doesn’t. CMM forces you to write a zillion documents. XP forces you to write a zillion tests, documentation of features, and story cards. CMM is about lengthening the development cycle. XP is about shortening the development cycle.

CMM is inflicted on you by customers who believe the lies of the worthless bastards from the Software Engineering Institute that their “process” will make you ship quality software. XP attempts to establish itself as an alternative legitimate “process” in the realm of suspicious customers and costly vendors. XP tries to sell commonsense stuff like automated testing to the desperate programmers working on a single-customer product with scarce resources. XP also tries to sell itself to the customer, relieving the desperate vendor’s programmers from the insane, intolerable overhead of level 5 CMM paralysis.

XP brings hope to dark, wet, stinky corners around the world. I sincerely think it’s great. The permanent brain damage of that CMM training course makes me admire XP.

But then there are people who work for product companies and inflict XP upon themselves. It’s like paying taxes the government doesn’t ask you to pay. XP is an alternative to CMM. You need it if your customer or manager requires you to use a ready-made methodology, out of lack of trust. If you don’t have that problem, make yourself your own “process”. You are blessed with an opportunity to just work, and organize your work as you go and as you see fit. Why “practice” stuff when you can just work?

How many product companies use CMM, XP or any other ready-made process, and how many make their own process? How many successful companies borrow existing “methodologies” without being forced to do so? I don’t know the numbers, but my bet is that most of them don’t.

XP and religion

“EMBRACE CHANGE”

This quote is right from the book cover. “Extreme Programming Explained. EMBRACE CHANGE.” Does it freak you out the way it freaks me out? Maybe it’s because of the cultural gap created by my Russian origins? Nay, I know plenty of English slogans I can relate to. Say, “Trust a condom”. Beats “Embrace change” hands-down. Changes come in two flavors, good and bad. Should I “embrace” both kinds?

“Even programmers can be whole people in the real world. XP is an opportunity to test yourself, to be yourself, to realize that maybe you’ve been fine all along and just hanging with the wrong crowd.”

Is this a religious cult or what?

“The key to XP is integrity, acting in harmony with my true values… The past five years have been a journey of changing my actual values into those I wanted to hold.”

“Journey”. Talking about being good. Do you like hippies? I like hippies more than nazis. I like XP more than CMM. But IMO the hippie world view and general style is suboptimal.

“With XP, I work to become worthy of respect and offer respect to others. I’m content to do my best and strive always to improve. I hold values I’m proud of and act in harmony with those values.”

“I have seen people applying XP bring renewed hope to their software development and their lives. You know enough to get started. I encourage you to start right now. Think about your values. Make conscious choices to live in harmony with them.”

It’s a religion, people.

I’m not strictly against religion. I even have vague respect towards religious people, those stating that their purpose in life is to be good and that they know how to do it. (I similarly respect the intentions and discipline of various development methodology followers.) Of course when a religious person behaves as an asshole, it’s way more annoying than a secular asshole is, because the latter at least doesn’t state that he’s all about being good and living right. (I noticed that many people who love to talk about “proper process” commit the worst atrocities.)

Some religions or sects are awful, to the point of human sacrifices. (CMM.) Most modern mainstream religions are kinda nice though, and they promote good Values and Practices, like not killing, not stealing and helping each other. (XP promotes automated testing, short build, integration and release cycles.) Many religions have peculiar regulations when it comes to sex. (XP has The Coupling Practices.) This is when religious people start to get annoying, to the point of throwing stones at people who are improperly dressed, not to mention engaging in forbidden sexual relationships. (Refuse to “pair” and XP gets you fired.)

Interestingly enough, the majority of religious people feel much stronger about the peculiar regulations of their religion than the universally recognized atrocities, which are also forbidden by their religion. They never gather around prisons to throw stones at convicts going on vacation, they are more likely to do so near a gay parade. (The Senior VP from Kent Beck’s interview didn’t recommend firing over lack of tests, which all of us would recognize as a crime; he recommended firing over the violation of The Coupling Practices.)

And this is when my patience towards religious people evaporates. So there’s a gay parade. Got a problem with it? Shove it! I’m not gay, but the idea of harassing people based on this sort of criteria is disgusting. (I think I can blend in very well in a team engaging in pair programming and collective ownership, but I know awesome programmers who can’t work that way; just try to get them fired.)

But still, religion is not all bad. I think that there are religious communities which are less inflicted with drug abuse compared to your average secular community. (It’s not unlikely that many XP teams have more tests and less bugs than “secular” teams using no “standard” process.) Few people have the courage and the patience needed to openly attack religion; in particular, holy scriptures tend to be long and obscure. (Skim through an XP or especially a CMM book and try to stay awake or find refutable claims.)

Young people who aren’t cynical enough to realize that bullshit is free and hence everybody is going to bullshit you if only you let them tend to be an easy target for religions seeking expansion. (When I started working as a programmer, I was shocked how messy everything was, and cheerfully welcomed every “methodologist” about to create order out of the chaos. Today, even saying “Best Practice” near me is nor recommended.)

Summary

I treat XP as a religious lifestyle (true of any methodology). XP is not a bad religion. Its primary virtue is the threat it poses to the cannibalistic cults such as the CMM. While I don’t want to be anywhere near XP churches, I think religion is an inevitable attribute of human existence, and as far as religions go, XP is not at all bad.