OO C is passable

June 2nd, 2008

My problem with C++ bashing is that I'm overqualified. Ever since I've published C++ FQA Lite, I've tried to stay out of "C++ sucks" discussions. I've said everything before and I don't want to say it again. And then much of the C++ arcana is finally fading from my memory; good, no need to refresh it, thank you.

I don't always quit those discussions though. How should I do that? If I were someone else, I could send a link to the C++ FQA to end the discussion ("if you really care, check out yada yada"). But I can't use this link myself, because "OMG, did you actually write a whole site about this?!"

So the last time I didn't quit this discussion, I said: "You know, at some point you actually start to prefer plain C". The seasoned C++ lover replied: "You mean, the C with classes style, with no tricky stuff? Sure, don't we all end up writing most of the code like that?" No, said I, I meant plain C. No plus signs attached. File names ending with .c. C without classes. C.

"Now that is fanaticism," said the guy. "What could be the point of that? You know what? You may like C++ or you may dislike it, but face it: C just isn't good enough".

Fanaticism?! Now that is an insult. Yes, I recently decided to write a bunch of code in plain C inside a C++ code base. I did this after maintaining the previous, C++ implementation of that stuff for 4 years. For at least 3 of those years, I didn't do any significant rewriting in that code, because it could interfere with schedules and it would complicate merges. Although some pieces were hopelessly awful. Sloppy text parsing interleaved with interrupt handling (I exaggerate, but only very slightly). But it worked, so it was hardly urgent to fix it.

And then it had to be ported to a new platform. And I obsessed over the rewrite-or-extend question for a lot of time. There was a critical mass of incompatible new stuff, so I chose "rewrite". But. I decided to write the new version in C++. Why? Because it's a C++ code base, and people are used to C++ and its style and idioms, however brain-damaged. "So you, YOU will do it in C++ after all?! You're a wuss," said my manager, an old-time C++ hater. Time for the rhetorical question. Does this sound like the story of a fanatic?

OK then, why did I decide to do it in C after all, you may ask. I'll tell you why. I did it because everybody was sick and tired of the build time of my auto-generated C++ code.

You see, the whole thing is data-driven. The data objects describing its workload are generated at build time. Do you know any way of generating C++ code that doesn't compile slowly as hell? I don't. I've generated both "real" code (functions doing stuff) and "data definition" code (which does practically nothing except for calling object constructors). And it always compiles slowly. Sometimes turning optimization off speeds up compilation significantly, sometimes less significantly. But the best part is this: you never know exactly what's your problem. Try to profile a C++ compiler.

It's not that manually written C++ code is such a blast. For example, we have a ~2K LOC file defining a class template and explicitly instantiating it 4 times. It worked fine for years, until it met a particular version of the Green Hills C++ compiler. That version decided that it needs more than 1.5G of memory to compile that file. I don't know how much more exactly, because at that point my workstation suffocated, and could only breathe again when the process ran out of swap space and died. Here's another rhetorical question: how the fuck are you supposed to figure out what the fuck causes this problem?

What's that? "It's a tool problem, not a language problem?" Bzzzt, wrong answer! It's neither a language problem nor a tool problem; it's my problem, because I must fix it. And this is why I don't want to deal with a language that consistently breeds tools which create such problems for me. But since I'm hardly a fanatic, and I know exactly why I do want to work on this particular C++ code base, I hold my nose and I delve right into the pile of excrements and find out that if you instantiate each template in its own file, then the process memory consumption barely crosses the 350M mark. Nice and compact, that. So, let's use 4 files.

Nope, manually written C++ code isn't a picnic. But auto-generated code is worse, because it relies on some set of features and uses them a lot. The number of uses per feature per file matters. 1 explicit template instantiation per file = 350M of compiler process memory. 4 instantiations = out of memory. What about "simpler" features, but hundreds of uses? The compiler will grind to a halt for sure. Um, you might say, don't other languages have "features" which will be used hundreds of times by generated code? Yes, they do. Those features just DON'T SUCK quite as impressively. Go ahead, show me a problem with compilation speed anywhere near what C++ exhibits in another language.

Face it: C++ just isn't good enough. "If you really care about C++ parsing complexity, check out the FQA yada yada". I wrote "a whole site" about it, you know. Bottom line: if you generate native code, you want to define your object model such that the generated code can be C code. Assembly is pointlessly low-level and non-portable, and C++ sucks. Believe me, or die waiting for your code to compile.

So, C. My object model will be in C. Um. Bummer. It's an OO thing, with plugins and multiple inheritance and virtual inheritance. It has to be. You have orthogonal plugins which want to derive classes from a common base – a classic diamond hierarchy. Well, I can have a couple of macros for doing MI-style pointer arithmetic, by fetching the derived-type-specific offset of each base class object at run time. No big deal. I even like it better than the C++ MI downcasting syntax – at least you know exactly what you're doing, and you don't need to think whether it should be dynamic_cast or static_cast or eat_flaming_death_cast to really work.

But I miss virtual functions. I really do. I sincerely think that each and every notable feature C++ adds to C makes the language worse, with the single exception of virtual functions. Here's why not having virtual functions in C sucks:

You can't quite fake them with C macros.
Virtual function call is a shortcut for obj->vtable->func(obj, args). The OO spelling – obj->func(args) – is of course better.
You'll usually try to make the C version shorter: obj->func(args), obj->vtable->func(args), or obj->func(obj, args). Quite likely you'll find out that you really needed to pass obj to func and/or the vtable level of indirection. Updating the code may be tedious/too late/really annoying (because of having to admit a stupid mistake). The eventual lack of call syntax uniformity will also be annoying.
Decent C++ debuggers automatically downcast base class object pointers to the real run time type when you inspect the object, even when C++ RTTI support is turned off at compile time. They do it by looking at the vtable pointer. Achieving this with OO C is only possible on a per-OO-faking-style, per-debugger basis, using the ugly debugger scripting facilities. Most likely, you won't do it and choose interactive suffering each time you debug the code, having to figure out the actual type yourself and cast pointers manually.
With virtual functions, base class implementations are inherited automatically. With explicit vtable structures, you need to either have links to base class implementations (the slow MFC message table way), or you need to explicitly and fully fill the vtables in each derived class. Possibly using the C default of trailing zeros in aggregate initializers as in vtable_type vtable={foo_func,bar_func} /* and the third member, baz_func, is 0 - we check for zero vtable entries before calling our pseudo-virtual functions */. Run time checks for null function pointers can make initialization code smaller, but they also make function calls slower.
With explicit vtable initializers, you only see the position of the function in the vtable initializer and its "derived class" name (my_class_baz_func), not its "base class" name (baz_func). You are likely to have a slightly inconsistent "derived class method" naming convention, making it annoying to figure out exactly which base class function we're overriding here.

An impressive list, isn't it? You can see from it that I've really put my employer's money where my mouth is and actually worked with OO C for a while. Aren't C++ classes with virtual functions simply better? No, because C++ classes don't support aggregate initialization. If you have C structures with vtable pointers, you can use the frob_type g_obj={&vtable,5,"name"} initialization style. This translates to assembly that looks like so:

g_obj:
.word vtable
.word 5
.word .str34
.str34:
.asciz "name"

This compiles and loads as fast as it gets. Now, if you choose real C++ vtables, you rule out aggregate initialization once and for all. Your initialization will instead have to be spelled as frob_type g_obj(5, "name"), and even if you have no constructor arguments, C++ will generate a constructor to set the vtable pointer.

The good news: at least the explicit reference to the vtable in our source code is gone. The bad news: with many objects, the C++ version compiles very slowly (I've checked with GNU and Green Hills C++). It also generates a much larger image, since it emits both global data objects and assembly code copying them into object members at run time. The latter code also costs you load time. And if you crash there, good luck figuring out the context. But as far as I'm concerned, the worst part is the build time explosion.

Yes, yes. It's not important. And it's FUD. And it's a tool problem. A "good" compiler could optimize the C++ version and generate the same assembly as the C version. And it could do it quickly. Those compiler writers are just lame. I completely agree with you, sir. Just go away already.

By the way, the same trade-off happens with C++ containers, like std::vector. They're better than {T*base;int size;} structures because you have the shortcut of operator[] (as opposed to array->base[i]). And because debuggers can gracefully display all elements of std::vector as a list of the right size. Some of the debuggers can. Sometimes. Sometimes it breaks. But when it doesn't, it's fun. But, again, you can't use aggregate initialization once your structure has a std::vector in it. And C++0x won't solve it, because its pseudo-aggregate initializers are syntactic sugar, and my problem here isn't the syntax, it's the build time.

And std::vector forces allocation of data on the heap (let's not discuss custom allocator templates, 'cause I'm gonna vomit). Can't have the base address of a std::vector point to a global variable generated specifically to hold its data.

I like things to point to globals generated to hold their data. Helps a lot when you debug, because your pointer is now firmly associated with a symbol table name. And no matter what memory-corrupting atrocity was committed by buggy code, that association will be there to help you figure things out. And heap corruption is very common in C++, because it's a completely unsafe language. So I care a lot about debugging core dumps with corrupted memory. Which is why base, size structures get my vote.

And that's an important point: you can live with just safety, or just control, and of course with both, but if you have neither safety nor control, then, sir, welcome to hell. Which is why OO C is worse than OO in Java or Lisp or PowerShell, but better than OO in C++.

And OO C is not all bad after all. Manually faking virtual functions has its benefits:

You can dynamically "change the object type" by changing the vtable pointer. I first saw this in POV-Ray, which has a vtable for circles and a vtable for ellipses. When a geometric transformation applied to a circle object transforms it to an ellipse, the vtable pointer is changed to point to the more generic and slower ellipse rendering functions. Neat. You could do this using C++-style OO by having another level of indirection, but with C, it's marginally faster, which can matter sometimes. And the C way is much neater, which is useful to balance the frustration caused by the drawbacks of fake OO. I use this trick a lot for debug plugins in my fake OO stuff.
Likewise, you can overwrite individual vtable entries. This changes the type of all objects of that "class" – primarily useful for logging and other sorts of debugging.
Sometimes you really don't need obj->vtable->func(obj, args) – say, obj->func(args) is good enough. And then you win.
You don't have to use a structure with named members to represent a vtable. If a bunch of functions have the same prototype, you can keep them in an array. You can then iterate over them or use an index into that array as a "member function pointer". This way, you can have a function calling a method in a bunch of objects, and the method to call will be a parameter of that function. The C++ member function pointers don't support the iterate-over-methods trick as efficiently, and their syntax is remarkably ugly.
That each function has a unique non-mangled name (as opposed to A::func, B::func, etc.) has the benefit of making the symbol table clean and independent of the non-portable C++ mangling. And you no longer depend on the varying definition look-up abilities of debuggers and IDEs (I don't like the lengthy disambiguation menus when I ask to show me the definition of "func", and these menus show up too often).
If you serialize the objects, or you have programs shoveling through process snapshots and analyzing them, the memory image of OO C objects is easier to deal with because of being more portable. Basically you only need to know the target endianness, the alignment requirements of built-in types, sizeof(enum) and the implementation of bitfields, if you use that. With C++, you need to know the layouts of objects when multiple/virtual inheritance/empty base class optimization/etc. is at play; this isn't portable across compilers. And even that knowledge is only useful if you know the members of each class; otherwise, you need to parse the layouts out of non-portable debug information databases – parsing C++ source code is not an option. With C, you can parse its files and find the struct definitions and figure out the layouts and it will be really easy and portable. Been there, done that.

Of course, you can use all those tricks in a C++ object model when needed, and use virtual functions elsewhere. And if C++ didn't suck so much (for example, if code compiled reasonably fast and if it had a standard ABI and and and...), that would be the way to go. But since C++ just isn't good enough, and I have to use a pure C object model, I point out the benefits of the sucky thing that is OO C to make it a little less bitter.

Overall, OO C is passable, more so than the endless rebuild cycles caused by our previous C++ object model. So you have a bunch of lame stuff. Having to use a typedef because struct A can only be referred to as struct A, not A. No default function arguments. Having to declare variables at the beginning of the scope. (BTW, I use C89. I don't believe in C99. I'm a Luddite, you already know that, right? I don't think C99 is as portable as C89 here in 2008).

So, yeah. I do miss some C++ features, but it's a small itching here, a tiny scratching there, nothing serious. I can live with that as the price for not having my legs shot off by mysterious compile time explosions and other C++ goodies. OO C is good enough for me. Fanaticism? You be the judge.