OO C is passable

June 2nd, 2008

My problem with C++ bashing is that I'm overqualified. Ever since I've published C++ FQA Lite, I've tried to stay out of "C++ sucks" discussions. I've said everything before and I don't want to say it again. And then much of the C++ arcana is finally fading from my memory; good, no need to refresh it, thank you.

I don't always quit those discussions though. How should I do that? If I were someone else, I could send a link to the C++ FQA to end the discussion ("if you really care, check out yada yada"). But I can't use this link myself, because "OMG, did you actually write a whole site about this?!"

So the last time I didn't quit this discussion, I said: "You know, at some point you actually start to prefer plain C". The seasoned C++ lover replied: "You mean, the C with classes style, with no tricky stuff? Sure, don't we all end up writing most of the code like that?" No, said I, I meant plain C. No plus signs attached. File names ending with .c. C without classes. C.

"Now that is fanaticism," said the guy. "What could be the point of that? You know what? You may like C++ or you may dislike it, but face it: C just isn't good enough".

Fanaticism?! Now that is an insult. Yes, I recently decided to write a bunch of code in plain C inside a C++ code base. I did this after maintaining the previous, C++ implementation of that stuff for 4 years. For at least 3 of those years, I didn't do any significant rewriting in that code, because it could interfere with schedules and it would complicate merges. Although some pieces were hopelessly awful. Sloppy text parsing interleaved with interrupt handling (I exaggerate, but only very slightly). But it worked, so it was hardly urgent to fix it.

And then it had to be ported to a new platform. And I obsessed over the rewrite-or-extend question for a lot of time. There was a critical mass of incompatible new stuff, so I chose "rewrite". But. I decided to write the new version in C++. Why? Because it's a C++ code base, and people are used to C++ and its style and idioms, however brain-damaged. "So you, YOU will do it in C++ after all?! You're a wuss," said my manager, an old-time C++ hater. Time for the rhetorical question. Does this sound like the story of a fanatic?

OK then, why did I decide to do it in C after all, you may ask. I'll tell you why. I did it because everybody was sick and tired of the build time of my auto-generated C++ code.

You see, the whole thing is data-driven. The data objects describing its workload are generated at build time. Do you know any way of generating C++ code that doesn't compile slowly as hell? I don't. I've generated both "real" code (functions doing stuff) and "data definition" code (which does practically nothing except for calling object constructors). And it always compiles slowly. Sometimes turning optimization off speeds up compilation significantly, sometimes less significantly. But the best part is this: you never know exactly what's your problem. Try to profile a C++ compiler.

It's not that manually written C++ code is such a blast. For example, we have a ~2K LOC file defining a class template and explicitly instantiating it 4 times. It worked fine for years, until it met a particular version of the Green Hills C++ compiler. That version decided that it needs more than 1.5G of memory to compile that file. I don't know how much more exactly, because at that point my workstation suffocated, and could only breathe again when the process ran out of swap space and died. Here's another rhetorical question: how the fuck are you supposed to figure out what the fuck causes this problem?

What's that? "It's a tool problem, not a language problem?" Bzzzt, wrong answer! It's neither a language problem nor a tool problem; it's my problem, because I must fix it. And this is why I don't want to deal with a language that consistently breeds tools which create such problems for me. But since I'm hardly a fanatic, and I know exactly why I do want to work on this particular C++ code base, I hold my nose and I delve right into the pile of excrements and find out that if you instantiate each template in its own file, then the process memory consumption barely crosses the 350M mark. Nice and compact, that. So, let's use 4 files.

Nope, manually written C++ code isn't a picnic. But auto-generated code is worse, because it relies on some set of features and uses them a lot. The number of uses per feature per file matters. 1 explicit template instantiation per file = 350M of compiler process memory. 4 instantiations = out of memory. What about "simpler" features, but hundreds of uses? The compiler will grind to a halt for sure. Um, you might say, don't other languages have "features" which will be used hundreds of times by generated code? Yes, they do. Those features just DON'T SUCK quite as impressively. Go ahead, show me a problem with compilation speed anywhere near what C++ exhibits in another language.

Face it: C++ just isn't good enough. "If you really care about C++ parsing complexity, check out the FQA yada yada". I wrote "a whole site" about it, you know. Bottom line: if you generate native code, you want to define your object model such that the generated code can be C code. Assembly is pointlessly low-level and non-portable, and C++ sucks. Believe me, or die waiting for your code to compile.

So, C. My object model will be in C. Um. Bummer. It's an OO thing, with plugins and multiple inheritance and virtual inheritance. It has to be. You have orthogonal plugins which want to derive classes from a common base – a classic diamond hierarchy. Well, I can have a couple of macros for doing MI-style pointer arithmetic, by fetching the derived-type-specific offset of each base class object at run time. No big deal. I even like it better than the C++ MI downcasting syntax – at least you know exactly what you're doing, and you don't need to think whether it should be dynamic_cast or static_cast or eat_flaming_death_cast to really work.

But I miss virtual functions. I really do. I sincerely think that each and every notable feature C++ adds to C makes the language worse, with the single exception of virtual functions. Here's why not having virtual functions in C sucks:

An impressive list, isn't it? You can see from it that I've really put my employer's money where my mouth is and actually worked with OO C for a while. Aren't C++ classes with virtual functions simply better? No, because C++ classes don't support aggregate initialization. If you have C structures with vtable pointers, you can use the frob_type g_obj={&vtable,5,"name"} initialization style. This translates to assembly that looks like so:

g_obj:
.word vtable
.word 5
.word .str34
.str34:
.asciz "name"

This compiles and loads as fast as it gets. Now, if you choose real C++ vtables, you rule out aggregate initialization once and for all. Your initialization will instead have to be spelled as frob_type g_obj(5, "name"), and even if you have no constructor arguments, C++ will generate a constructor to set the vtable pointer.

The good news: at least the explicit reference to the vtable in our source code is gone. The bad news: with many objects, the C++ version compiles very slowly (I've checked with GNU and Green Hills C++). It also generates a much larger image, since it emits both global data objects and assembly code copying them into object members at run time. The latter code also costs you load time. And if you crash there, good luck figuring out the context. But as far as I'm concerned, the worst part is the build time explosion.

Yes, yes. It's not important. And it's FUD. And it's a tool problem. A "good" compiler could optimize the C++ version and generate the same assembly as the C version. And it could do it quickly. Those compiler writers are just lame. I completely agree with you, sir. Just go away already.

By the way, the same trade-off happens with C++ containers, like std::vector. They're better than {T*base;int size;} structures because you have the shortcut of operator[] (as opposed to array->base[i]). And because debuggers can gracefully display all elements of std::vector as a list of the right size. Some of the debuggers can. Sometimes. Sometimes it breaks. But when it doesn't, it's fun. But, again, you can't use aggregate initialization once your structure has a std::vector in it. And C++0x won't solve it, because its pseudo-aggregate initializers are syntactic sugar, and my problem here isn't the syntax, it's the build time.

And std::vector forces allocation of data on the heap (let's not discuss custom allocator templates, 'cause I'm gonna vomit). Can't have the base address of a std::vector point to a global variable generated specifically to hold its data.

I like things to point to globals generated to hold their data. Helps a lot when you debug, because your pointer is now firmly associated with a symbol table name. And no matter what memory-corrupting atrocity was committed by buggy code, that association will be there to help you figure things out. And heap corruption is very common in C++, because it's a completely unsafe language. So I care a lot about debugging core dumps with corrupted memory. Which is why base, size structures get my vote.

And that's an important point: you can live with just safety, or just control, and of course with both, but if you have neither safety nor control, then, sir, welcome to hell. Which is why OO C is worse than OO in Java or Lisp or PowerShell, but better than OO in C++.

And OO C is not all bad after all. Manually faking virtual functions has its benefits:

Of course, you can use all those tricks in a C++ object model when needed, and use virtual functions elsewhere. And if C++ didn't suck so much (for example, if code compiled reasonably fast and if it had a standard ABI and and and...), that would be the way to go. But since C++ just isn't good enough, and I have to use a pure C object model, I point out the benefits of the sucky thing that is OO C to make it a little less bitter.

Overall, OO C is passable, more so than the endless rebuild cycles caused by our previous C++ object model. So you have a bunch of lame stuff. Having to use a typedef because struct A can only be referred to as struct A, not A. No default function arguments. Having to declare variables at the beginning of the scope. (BTW, I use C89. I don't believe in C99. I'm a Luddite, you already know that, right? I don't think C99 is as portable as C89 here in 2008).

So, yeah. I do miss some C++ features, but it's a small itching here, a tiny scratching there, nothing serious. I can live with that as the price for not having my legs shot off by mysterious compile time explosions and other C++ goodies. OO C is good enough for me. Fanaticism? You be the judge.