Redundancy vs dependencies: which is worse?

May 27th, 2008

I believe that there are just two intrinsic forces in programming:

You want to minimize redundancy and, ideally, define every piece of knowledge once.
You want to minimize dependencies – A should depend on B only if it absolutely must.

I think that all other considerations are of the extrinsic real-world kind – domain modeling, usability, schedules, platforms, etc. I also think that I can show how any "good" programming practice is mainly aimed at minimizing redundancy, dependencies, or both. I even think that you can tell a "good" programmer from a "bad" one by their attitude towards redundancy and dependencies. The good ones hate them, the bad ones don't care.

If this idea looks idiotically oversimplified, note that I mean "programming aptitude" in a narrow sense of code quality. I've seen brilliant, cooperative people with uncanny algorithmic capabilities who still wrote awful code. I tried to figure it out and the common denominator seemed to be that they just didn't care about redundancy or dependencies, or even kinda liked them. Maybe it still looks idiotically oversimplified. Let's leave it at that, because it's not what I'm here to talk about.

I'm here to talk about the case when minimizing redundancy conflicts with minimizing dependencies. This case is basically code reuse beyond module boundaries. You can choose between having modules A and B using a module C doing something, or have them do it themselves. What's your call?

One strikingly dumb thing about this question is that it's centered around the term "module", which is vague and informal. However, "module" is what makes this a trade-off. Inside a module, of course you want to reuse the code, end of discussion. Why would anyone want to parse two command line options with two duplicated code snippets when you could use a function?

On the other hand, if two modules parse command lines, we can still factor out the parsing code, but we'd have to make it a third module. Alternatively, we can stuff it into a "utilities" module. The one affectionately called "the trash can". The one which won't link without a bunch of external libraries used to implement some of its handy functions. The one with the configuration which always gets misconfigured, and the initialization which never happens at the right time. You know, the utilities module.

I believe that years of experience barely matter in terms of knowledge. You don't learn at work at a pace anywhere near that of a full-time student. Experience mainly does two things: it builds character, and it destroys character. Case in point: young, passionate programmers are usually very happy to make the third module, nor do they cringe when they delve into the utility trash can. They're then understandably offended when their more seasoned colleagues, having noticed their latest "infrastructural" activity, reach out for the barf bags. This certainly illustrates either the character-building or the character-destroying power of experience, I just don't know which one.

No, seriously. Take command line parsing. You want common syntax for options, right? And you want some of them to accept values, right? And those values can be strings, and booleans, and integers, right? And integers can be decimal or hexadecimal, right? And they can be values of user-defined types, right? And they can have help strings, right? And you'd like to generate help screens from them, right? And GUIs with property pages? And read them from configuration files? And check the legality of flags or sets of flags, right?

Sure. It's not a big deal. Trivial, even. (If you're smart, everything is trivial until you fail completely due to exceeding complexity. And admit that you failed due to exceeding complexity. The former takes time to happen, the latter can never happen.) Quite some people have devoted several of the beautiful months of their youth to the problem of argument passing. Example: XParam, which calls itself "The Solution to Parameter Handling". Took >10K LOC the last time I checked. Comes with its own serialization framework. Rumors tell that its original host project uses <5% of its features.

Clarification: I'm not mocking the authors of XParam. Reason 1: Rumors tell they are pretty sharp. Reason 2: I'm really, really ashamed to admit this, but I once worked on a logging library called XLog. Took >10K LOC the last time I counted. Came with its own serialization framework. First-hand evidence tells that its host project uses 0% of its features. Ouch.

You know how I parse command line arguments in my modules these days? Like so:

for(i=0; i<argc; ++i) {
  if(strcmp(argv[i],"-trace")==0) {
    trace=1;
  }
}

I used C for the example because it's the ugliest language for string processing, and it's still a piece of cake. No, I don't get help screens. No, I don't get proper command line validation. So sue me. They're debugging options. It's good enough. Beats having everything depend on a 10K LOC command line parsing package. Not to mention a 50K LOC utility trash can full of toxic waste.

Modules are important. Module boundaries are important. A module is a piece of software that has:

A reasonably compact, stable interface. An unfortunate side effect of OO training is that compact interfaces aren't valued. It's considered OK to expose an object model with tens of classes and hairy data structures and poorly thought-out extensibility hooks. Furthermore, an unfortunate side effect of C++ classes is that "stable interface" is an oxymoron. But no matter; nothing prevents you from implementing a compact, stable interface.
Documentation. The semantics of the compact and stable interface are described somewhere. A really good module comes with example code. "Internal" modules lacking reasonably complete documentation suck, although aren't always avoidable.
Tests. I don't think unit-testing each class or function makes sense. What has to be tested is the "official" module interfaces, because if they work, the whole module can be considered working, and otherwise, it can't.
Reasonable size. A module should generally be between 1K and 30K LOC (the numbers are given in C++ KLOC units; for 4GLs, divide them by 4). Larger modules are a pile of mud. A system composed of a zillion tiny modules is itself a pile of mud.
Owner. The way to change a module is to convince the owner to make a change. The way to fix a bug is to report it to the owner. That way, you can count on the interface to be backed up by a consistent mental model making it work. The number of people capable of simultaneously maintaining this kind of mental model was experimentally determined to be 1.
Life cycle. Except for emergency bug fixes, changes to a module are batched into versions, which aren't released too frequently. Otherwise you can't have a tested, stable interface.

Pretty heavy.

Do I want to introduce a module to handle command line parsing? Do I really wish to become the honored owner of this bleeding-edge technology? Not now, thank you. Of course, it's the burnout speaking; it would be fun and it would be trivial, really. Luckily, not everyone around is lazy and grumpy like me. See? That guy over there already created a command line parsing module. And that other person here made one, too. Now, do I want my code to depend on their stuff?

Tough question. I've already compromised my reputation by refusing to properly deal with command line parsing. If my next move is refusing to use an Existing Solution, I will have proved the antisocial nature of my personality. Where's my team spirit? And still, I have my doubts. Is this really a module?

First and foremost, and this is the most annoying of all questions – who owns this piece of work? Sure, you thought it was trivial and you hacked it up. But are you willing to support it, or do you have someone in mind you'd like to transfer the ownership to? You probably don't even realize that it has to be supported, since it's so trivial. Oh, but you will. You will realize that it has to be supported.

I'm enough of a Kassandra to tell exactly when you'll realize it. It will be when the first completely idiotic change is made to your code by someone else. Not unlikely, tying it to another piece of "infrastructure", leading to a Gordian knot of dependencies. Ever noticed how infrastructure lovers always believe their module comes first in the dependency food chain and how it ultimately causes cyclic dependencies? So anyway, then, you'll understand. Of course, it will be too late as far as I'm concerned. My code got tied to yours, theirs, and everybody else's sticky infrastructure. Oopsie.

(The naive reader may ask, what can a command line parser possibly depend on? Oh, plenty of stuff. A serialization package. A parsing package. A colored terminal I/O package, for help screens. C++: a platform-specific package for accessing argc, argv before main() is called. C++: a singleton initialization management package. WTF is that, you ask? Get a barf bag and check out what Modern C++ Design has to say on the subject).

So no, I don't want to depend on anything without an owner. I see how this reasoning can be infuriating. Shifting the focus from software to wetware is a dirty trick loved by technically impotent pseudo-business-oriented middle-management loser types. Here's my attempt at distinguishing myself from their ilk: not only do I want to depend on stuff with an owner, but I require a happy owner at that. Contrary to a common managerial assumption (one of those which rarely hold but do keep managers sane), I don't believe in forcibly assigning ownership. If the owner doesn't like the module, expect some pretty lousy gardening job.

What about life cycle? My modules and your modules are released at different times. I might run into a need to check for compatibility with different versions of your stuff. My tests won't compile without your code, so now I need to always have it in my environment. What does your code depend on – is it a small, stable, defined set of things? I don't want to be stuck just because your new version drags in more dependencies. What if I need a feature? You don't seem to support floating point numbers. And I don't see abbreviations, either; I need them for a bunch of flags 'cause they're passed interactively all the time.

What about stable interface and semantics? Your previous version would accept command lines passing the same option multiple times, and take the last value. I have test scripts that count on that. Your new version reports an error, because now this syntax is reserved for lists (-frob=a -frob=b -frob=c passes the list a,b,c as the value of the frob option). Sigh. I guess it could be worse – you could make the string "a,b,c" from this command line, and then the problem would propagate deeper.

I could go on and on, about how your interface isn't a decent public interface (you don't seriously expect me to subclass EnumParser to handle flags with a fixed set of legal string values, do you?). Or about the size of your code, which at the moment dwarfs the size of my own module, so I'd have more command line parsing code than anything else in my tests. And how it hurts when you download the tests using a slow connection to the target machine. And how I don't like it when my tests crash inside your code, even when it's my fault, because you have hairy data structures that I don't like to inspect.

But you already got it – I don't want your code, because I'm an antisocial asshole that has no team spirit whatsoever. I'm going to parse arguments using 5 lines of C code. Worse, I'll make a function out of those five lines, prefix its name with the module name, and replicate it in all my modules. Yep, I'm the copy-paste programmer and you're the enlightened developer of the next generation command line parsing platform. Have it your way, and I'll have it my way.

To complete the bad impression I've been making here, I'll use a flawed real-world analogy. Think living organisms. They have a very serious Tower of Babel problem; redundancy is everywhere. I've heard that humans and octopuses have very similar eye structure, despite being descendant from a blind ancestor. Command line parsers?! Entire eyes, not the simplest piece of hardware with quite some neural complexity at the back-end, get developed independently. Redundancy can hardly get any worse. But it works, apparently better than coordinating everybody's efforts to evolve would work.

Redundancy sucks. Redundancy always means duplicated efforts, and sometimes interoperability problems. But dependencies are worse. The only reasonable thing to depend on is a full-fledged, real module, not an amorphous bunch of code. You can usually look at a problem and guess quite well if its solution has good chances to become a real module, with a real owner, with a stable interface making all its users happy enough. If these chances are low, pick redundancy. And estimate those chances conservatively, too. Redundancy is bad, but dependencies can actually paralyze you. I say – kill dependencies first.