<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Proper Fixation</title>
	<atom:link href="http://www.yosefk.com/blog/feed" rel="self" type="application/rss+xml" />
	<link>http://www.yosefk.com/blog</link>
	<description>a substitute for anaesthesia</description>
	<pubDate>Fri, 05 Mar 2010 22:02:41 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
	<language>en</language>
			<item>
		<title>API users &#038; API wrappers</title>
		<link>http://www.yosefk.com/blog/api-users-api-wrappers.html</link>
		<comments>http://www.yosefk.com/blog/api-users-api-wrappers.html#comments</comments>
		<pubDate>Fri, 05 Mar 2010 17:45:11 +0000</pubDate>
		<dc:creator>Yossi Kreinin</dc:creator>
		
		<category><![CDATA[wetware]]></category>

		<guid isPermaLink="false">http://www.yosefk.com/blog/?p=102</guid>
		<description><![CDATA[Focus on the satisfaction of the happy many who couldn't care less, securing the option to ignore the miserable few who think too much.]]></description>
			<content:encoded><![CDATA[<p>Suppose you have a sparse RAM API, something along the lines of:</p>
<ul>
<li> add_range(base, size)</li>
<li>write_ram(base, bytes)</li>
<li>read_ram(base, size)</li>
</ul>
<p>People use this API for things like running a simulated CPU:</p>
<ol>
<li>define the accessible memory with add_range()</li>
<li>pass the initial state to the simulator with write_ram()</li>
<li>run the simulation, get the final state with read_ram()</li>
</ol>
<p>Suppose this API becomes a runaway success, with a whopping 10 programmers using it (very little irony here, &gt;95% of the APIs in this world are used exclusively by their designer). Then chances are that 9 of the 10 programmers are API users, and 1 of them is an API wrapper. Here&#8217;s what they do.</p>
<p><strong>API users</strong></p>
<p>The first thing the first API user does is call you. &#8220;How do I use this sparse thing of yours?&#8221; You point him to the short tutorial with the sample code. He says &#8220;Uhmm. Errm&#8230;&#8221;, which is userish for &#8220;Come <em>on</em>, I know you know that I&#8217;m lazy, and you know I know that docs lie. Come over here and type the code for me.&#8221; And you insist that it&#8217;s actually properly documented, but you will still come over, just because it&#8217;s him, and you personally copy the sample code into a source file of his:</p>
<pre>add_range(0x100000, 6) # input range
add_range(0x200000, 6) # output range
write_ram(0x100000, "abcdef")
# run a program converting the input to uppercase
print read_ram(0x200000, 6) # should print "ABCDEF"</pre>
<p>It runs. You use the opportunity to point out how <em>your</em> documentation is better than what he&#8217;s perhaps used to assume (though you totally understand his frustration with the state of documentation in this department, this company and this planet). Anyway, if he has any sort of problem or inconvenience with this thing, he can call you any time.</p>
<p>The next 8 API users copy your sample code themselves, some of them without you being aware that they use or even need this API. Congratulations! Your high personal quality standards and your user-centric approach have won you a near-monopoly position in the rapidly expanding local sparse RAM API <a title="Very little irony here, too" href="http://www.yosefk.com/blog/the-internal-free-market.html">market</a>.</p>
<p>Then some time later you stumble upon the following code:</p>
<pre>add_range(0x100000,256)
add_range(0x200000,1024)
add_range(0x300000,1024)
...
add_range(0xb00000,128)
...
add_range(0x2c00000,1024)
...</pre>
<p><em>Waitaminnit</em>.</p>
<p>You knew the API was a bit too low-level for the quite common case where you need to allocate a whole lot of objects, doesn&#8217;t matter where. In that case, something like base=allocate_range(size) would be better than add_range(base,size) - that way users don&#8217;t have to invent addresses they don&#8217;t care about. But it wasn&#8217;t immediately obvious how this should work (Nth call to allocate_range() appends a range to the last allocated address, but where should the first call to allocate_range() put things? What about mixing add_range() and allocate_range()? etc.)</p>
<p>So you figured you&#8217;d have add_range(), and then whoever needed to allocate lots of objects, doesn&#8217;t matter where, could just write a 5-line allocate_range() function good enough for him, though not good enough for a public API.</p>
<p>But none of them did. Why? Isn&#8217;t it trivial to write such a function? Isn&#8217;t it ugly to hard-code arbitrary addresses? Doesn&#8217;t it feel silly to invent arbitrary addresses? Isn&#8217;t it actually hard to invent constant addresses when you put variable-sized data there, having to think about possible overlaps between ranges? Perhaps they don&#8217;t understand what a sparse RAM is? Very unlikely, that, considering their education and experience.</p>
<p>Somehow, something makes it very easy for them to copy sample code, but very hard to stray from that sample code in any <em>syntactically</em> substantial way. To them, it isn&#8217;t a sparse RAM you add ranges to. Rather, they think of it as a bunch of add_range() calls with hexadecimal parameters.</p>
<p>And add_range() with hex params they promptly will, just as it&#8217;s done in the sample. And they&#8217;ll complain about how this API is a bit awkward, with all these hex values and what-not.</p>
<p><strong>API wrappers</strong></p>
<p>If there&#8217;s someone who can see right through syntax deep into semantics, it&#8217;s the tenth user of your API, or more accurately, its first wrapper. The wrapper never actually uses an API directly in his &#8220;application code&#8221; as implied by the abbreviation, standing for &#8220;Application Programming Interface&#8221;. Rather, he wraps it with another (massive) layer of code, and has his application code use that layer.</p>
<p>The wrapper first comes to talk to you, either being forced to use your API because everybody else already does, or because he doesn&#8217;t like to touch something as low-level as &#8220;RAM&#8221; so if there&#8217;s already some API above it he prefers to go through that.</p>
<p>In your conversation, or more accurately, his monologue, he points out some admittedly interesting, though hardly pressing issues:</p>
<ul>
<li>It&#8217;s important to be able to trick a program using the sparse RAM API into allocating its data in specific address ranges, so that the resulting memory map is usable on certain hardware configurations and not just in simulations.</li>
<li>In particular, it is important to be able to extract the memory map from the section headers of executables in the ELF and COFF format.</li>
<li>Since add_range() calls are costly, and memory map formats such as the S-Record effectively specify a lot of small, adjacent ranges, there is a need for a layer joining many such ranges.</li>
<li>An extensible API for the parsers of the various memory map formats is needed.</li>
<li>&#8230;</li>
</ul>
<p>When you manage to terminate the monologuish conversation, he walks off to implement his sparse RAM API on top of yours. He calls it SParser (layer lovers, having to invent many names, frequently deteriorate into amateur copywriters).</p>
<p>When he&#8217;s done (which is never; let&#8217;s say &#8220;when he has something out there&#8221;), nobody uses SParser but him, though he markets it heavily. Users won&#8217;t rely on the author who cares about The Right Thing but not about their problems. Other wrappers never use his extra layers because they write their own extra layers.</p>
<p>However, even with one person using it, SParser is your biggest headache in the sparse RAM department.</p>
<p>For example, your original implementation used a list of ranges you (slowly) scanned through to find the range containing a given address. Now you want to replace this with a page table, so that, given an address, you simply index into a page array with its high bits and either find a page with the data or report a bad address error.</p>
<p>But this precludes &#8220;shadowing&#8221;, where you have overlapping segments, one hiding the other&#8217;s data. You thought of that as a bug in the user code your original implementation didn&#8217;t detect. The wrapper thought it was a feature, and SParser uses it all over to have data used at some point and then &#8220;hidden&#8221; later in the program.</p>
<p>So you can&#8217;t deploy your new implementation, speeding up the code of innocent users, without breaking the code of this wrapper.</p>
<p><strong>What to do<br />
</strong></p>
<p>Add an allocate_range() API ASAP, update the tutorial, walk over to your users to help replace their hex constants with allocate_range() calls. Deploy the implementation with the page table, and send the complaining wrapper to complain upwards along the chain of command.</p>
<p><strong>Why</strong></p>
<p>Your users will switch to allocate_range() and be happy, more so when they get a speed-up from the switch to page tables. The wrapper, constituting the unhappy 10% of the stakeholders, will have no choice but fix his code.</p>
<p>Ivan drank half a bottle of vodka and woke up with a headache. Boris drank a full bottle of vodka and woke up with a headache. Why drink less?</p>
<p>Users are many, they follow a predictable path (copy sample code) and are easily satisfied (just make it convenient for them to follow that path). Wrappers are few, they never fail to surprise (you wouldn&#8217;t guess what and especially why their layers do), and always fail to be satisfied (they never use APIs and always wrap them). Why worry about the few?</p>
<p>The only reason this point is worth discussing at all is that users offend programmers while wrappers sweet-talk them, thus obscuring the obvious. It is natural to feel outrage when you give someone an add_range() function and a silly sample with hex in it, and not only do they mindlessly multiply hex numbers in their code, but they blame <em>you</em> for the inconvenience of &#8220;your API with all the hex in it&#8221;. It is equally natural to be flattered when someone spends time to discuss your work with you, at a level of true understanding (&#8221;sparse RAM&#8221;) rather than superficial syntactic pattern matching (&#8221;add_range(hex)&#8221;).</p>
<p>He who sees through this optical illusion will focus on the satisfaction of the happy many who couldn&#8217;t care less, securing the option to ignore the miserable few who think too much.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.yosefk.com/blog/api-users-api-wrappers.html/feed</wfw:commentRss>
		</item>
		<item>
		<title>Digital asses in the computing industry</title>
		<link>http://www.yosefk.com/blog/digital-asses-in-the-computing-industry.html</link>
		<comments>http://www.yosefk.com/blog/digital-asses-in-the-computing-industry.html#comments</comments>
		<pubDate>Sat, 06 Feb 2010 16:53:13 +0000</pubDate>
		<dc:creator>Yossi Kreinin</dc:creator>
		
		<category><![CDATA[wetware]]></category>

		<guid isPermaLink="false">http://www.yosefk.com/blog/?p=100</guid>
		<description><![CDATA[Ever noticed how academic asses are analog and industrial asses are digital? It&#8217;s legitimate to not know whether P equals NP, or to not know what x is if x*2=y but we don&#8217;t know y, for that matter. But it isn&#8217;t legitimate to not know how many cycles, megabytes or - the king of them [...]]]></description>
			<content:encoded><![CDATA[<p>Ever noticed how academic asses are analog and industrial asses are digital? It&#8217;s legitimate to not know whether P equals NP, or to not know what x is if x*2=y but we don&#8217;t know y, for that matter. But it isn&#8217;t legitimate to not know how many cycles, megabytes or - the king of them all - man-months it will take, so numbers have to be pulled out of one&#8217;s ass.</p>
<p>The interesting thing is that the ass adapts, that the numbers pulled out of this unconventional digital device aren&#8217;t pure noise. Is it because digital asses know to synchronize? Your off-by-2-months estimation is fine as long as other estimations are off by 5. But it&#8217;s not just that, there must be something else, a mystery waiting to be discovered. We need a theory of computational proctology.</p>
<p>Ever noticed how painful the act of anal estimation is for the untrained, um, mind, but then eventually people actually get addicted to it? Much like managers who learn that problems can be made to go away by means such as saying a firm &#8220;No&#8221;, without the much harder process of understanding the problem, not to mention solving it? Anal prophecy is to the technical &#8220;expert&#8221; the same raw enjoyment that the triumph of power over knowledge is to the manager. &#8220;Your powers are nothing compared to mine!&#8221;</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/NPVlljVWqBg&amp;hl=en_US&amp;fs=1&amp;rel=0" /><embed type="application/x-shockwave-flash" width="425" height="344" src="http://www.youtube.com/v/NPVlljVWqBg&amp;hl=en_US&amp;fs=1&amp;rel=0" allowscriptaccess="always" allowfullscreen="true"></embed></object><br />
There once was a company called ArsDigita (I warmly recommend the founder&#8217;s <a href="http://blogs.law.harvard.edu/philg/">blog</a> and have his <a href="http://en.wikipedia.org/wiki/Greenspun%27s_Tenth_Rule">Tenth Rule</a> tattooed all over my psyche), a name I tend to misread as &#8220;ArseDigital&#8221; - a tribute to an important method of numerical analysis and estimation in the computing industry.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.yosefk.com/blog/digital-asses-in-the-computing-industry.html/feed</wfw:commentRss>
		</item>
		<item>
		<title>The Virtue of a Manager</title>
		<link>http://www.yosefk.com/blog/the-virtue-of-a-manager.html</link>
		<comments>http://www.yosefk.com/blog/the-virtue-of-a-manager.html#comments</comments>
		<pubDate>Tue, 24 Nov 2009 22:39:40 +0000</pubDate>
		<dc:creator>Yossi Kreinin</dc:creator>
		
		<category><![CDATA[wetware]]></category>

		<guid isPermaLink="false">http://www.yosefk.com/blog/?p=96</guid>
		<description><![CDATA[The defining trait of a manager is the distinctly wide gap between responsibility and understanding.]]></description>
			<content:encoded><![CDATA[<p>I never managed a group larger than 5 people, luckily for the people in the group (perhaps more so for those remaining outside). Good managers are hard to find, which is the basis of my self-motivating motto: &#8220;This job could have been done worse&#8221;. Such is the background for the hereby presented pearls of wisdom assortment. As to &#8220;The Virtue of a Manager&#8221; title, it&#8217;s a ripoff of Paul Krugman&#8217;s exquisite title &#8220;<a title="The Conscience of a Liberal" href="http://en.wikipedia.org/wiki/The_Conscience_of_a_Liberal">The Conscience of a Liberal</a>&#8220;. &#8220;The Private Part of a Self-Important Self-Description&#8221; is a great template.</p>
<p>***</p>
<p>A prime virtue of a manager is the ability to take pride in someone else&#8217;s work.</p>
<p>No, seriously. We&#8217;ve recently deployed a debugger internally and an algorithm developer had a look at it. I knew it was good, but it&#8217;s used to debug the sort of thing algo devs hate: code with an anal-retentive performance focus. So the last thing I expected was praise, but praise it the guy did.</p>
<p>Now, I had previously known proud moments from having done things myself, and here I had this proud moment with 90% of the work done by someone else. And I&#8217;m telling you, it was just like the real thing.</p>
<p>***</p>
<p>The defining trait of a manager is the distinctly wide gap between responsibility and understanding.</p>
<p>By far the funniest spot to have a gap at, hence the easiest target for a low blow: try to make jokes about a gap between one&#8217;s teeth and you&#8217;ll soon be exhausted, but this here is gold. This is mean-spirited though. Imagine living with a gap between <em>your</em> responsibility and <em>your</em> understanding and everybody laughing at you - how would that make you feel? Show compassion.</p>
<p>***</p>
<p>One can have the title of a manager or nominal reports for any of a number of reasons:</p>
<ul>
<li>An HR system with per-title wage ceilings: can&#8217;t give someone a raise without faking a title.</li>
<li>A diametrically opposed case: some forms of brain damage cause people to accept <em>lower</em> paychecks given more impressive titles, larger rooms, etc.</li>
<li><a title="We've seen that someone before." href="http://www.yosefk.com/blog/the-nomadic-programmer.html">Someone</a> is too senior to report to a team leader but doesn&#8217;t want a team to report to him, either.</li>
<li>&#8230;</li>
</ul>
<p>In a roomful of managers, how do you find the real ones among this variety - not &#8220;real&#8221; as opposed to incompetent or unimportant, but &#8220;real&#8221; as opposed to fake?</p>
<p>There are several cues, for example, only real managers can have other managers report to them. But the perfect, if-and-only-if discriminator is that real managers don&#8217;t write code. (The precise rule is that they can spend up to 2% of their time on a favorite piece of code without getting disqualified.)</p>
<p>***</p>
<p>The principal function of a manager is being the responsible adult.</p>
<p>Some managers occasionally point this out in frustration, both mourning their technical skills which dry up during their current gig where they only get to exercise adulthood, and because being the adult means getting tired of the annoying kids. A gal who both managed and met literally hundreds of managers during her career in some consulting agency said &#8220;<em>Now</em> I really understand management&#8221; when she got to babysit.</p>
<p>This is why I have hard time believing management can be taught - you can&#8217;t teach adulthood, it can only result from people growing up by themselves. I&#8217;m not sure if this feeling is fully aligned with reality, but quite some very successful managers never went to a management school (at least one of those is <a title="Read John Walker. Subscribe today." href="http://www.fourmilab.ch/fourmilog/archives/2008-11/001074.html">somewhat critical</a> of MBAs), and some of those who went say it was worthless in terms of useful things learned.</p>
<p>The opposite is also true: childishness is fitting for a programmer. We were two fake code-writing managers in a meeting with one real one, and at one point the real one said: &#8220;Let&#8217;s not be childish about this&#8221;. The technically correct reply to her would have been &#8220;I&#8217;M NOT CHILDISH ABOUT THIS, HE IS!&#8221;, but I suppressed it for tactical reasons. Some time later I told her: &#8220;You don&#8217;t want us to stop being childish about this, not as long as you&#8217;re interested in our output as programmers. Recall: the reason <em>you</em> aren&#8217;t still programming is because of not being childish enough to truly enjoy this sort of game.&#8221;</p>
<p>And in fact since she started managing 20 programmers, she&#8217;s been talking about her work all the time, which she didn&#8217;t when she was programming. Well, some people like to play and some prefer to babysit. (I&#8217;m not sure where this leaves the quasi-managers who write code; presumably some are the elder and most responsible kid while others are the most restless who invent games for the gang.)</p>
<p>***</p>
<p>I&#8217;ve recently got a driving license. One thing I learned was that someone pushing his (presumably broken) car along the road is a &#8220;driver&#8221; as far as the law is concerned. I find this counter-intuitive, probably because pushing a car is not categorized in my head as &#8220;driving experience&#8221;, but, at least in Israel, that&#8217;s the law.</p>
<p>Likewise, doing the work of three people is not what most of us associate with &#8220;managerial responsibility&#8221;. However, if you&#8217;re given two reports without a drive of their own to work, that&#8217;s what your responsibility will be.</p>
<p>***</p>
<p>A manager will have favorite words. For example: acute (critical), priorities, agenda, rationale, integrity (shoot this manager first), responsibility (ownership), stakeholder.</p>
<p>Keep laughing at them. Once you become a manager, you&#8217;ll have favorite words whether you want it or not - it is useless to resist the dynamics inherent to your situation. My favorite word is &#8220;dynamics&#8221;. Its connotations are deep and its applicability wide - heartily recommended.</p>
<p>***</p>
<p>Managers get to do a lot of knowledge-free decision making, which necessarily drives them insane. Here&#8217;s how the manager&#8217;s bipolar disorder works.</p>
<p>During maniacal periods, the manager is the only one who can do anything around here. This frequently happens when the manager is under external pressure, and he feels that control is slipping out of his hands. He&#8217;s trying to compensate for his lack of knowledge by immense concentration and willpower. (Managers always have ample emergency supplies of both.) &#8220;Concentration&#8221; translates to an ability to derive general and far-reaching conclusions from insignificant details, then &#8220;willpower&#8221; translates to aggression.</p>
<p>Then depression follows: &#8220;Don&#8217;t bother me with details&#8221;. This results partly from exhaustion quickly arrived at during the mania (especially if reports were wise enough to not argue with the manager, letting his efforts defeat their own purpose.) The manager has delivered his trademark concentration and willpower, so he no longer feels guilty on that front. However, he&#8217;s overwhelmed by information and (rightly) feels that he doesn&#8217;t know what&#8217;s going on. He decides it is none of his business and concentrates on the Big Picture (does nothing). Usually, the cycle repeats upon a new wave of external pressure.</p>
<p>Awareness of the management cycle on behalf of the manager himself can help soften the cycle but not eliminate it. It is up to reports to apply counter-cycle measures by scheduling most work into depression periods when it is least disrupted. Special attention must be given to long-term projects, frequently characterized by a prolonged depressive apathy period at the beginning followed by a period of maniacal frenzy lasting until the end.</p>
<p>***</p>
<p>There&#8217;s a naive <a href="http://en.wikipedia.org/wiki/Triune_brain">brain model</a> in the spirit of &#8220;the brain has a reptilian part, a mammal part and a human part&#8221;. For example, if a student fails to answer a question in an oral exam with his human brain, the mammal brain feels bad about it and complains to the reptilian brain. The reptilian brain then cheerfully replies, &#8220;Who&#8217;s causing the trouble? Oh, that little guy behind the table? Not to worry - I&#8217;ll kill him&#8221;. The higher brains then supposedly suppress this - &#8220;What do you think this is, reptile - Jurassic Park?&#8221;, and the tension is translated into sweating.</p>
<p>The manager is the team&#8217;s reptilian brain; he doesn&#8217;t know enough to do real thinking, but he&#8217;s good at &#8220;taking responsibility&#8221;, bargaining, fighting, socializing, etc. A manager doesn&#8217;t know how to implement the feature, except for suspecting, based on experience, that it will conflict with a couple other features and it will take a week or three for the whole thing to stabilize (with him taking the heat when things break during those weeks). Therefore, instead of technical advice (which he might be otherwise qualified to give), he&#8217;ll propose something which solves the problem at his favorite social plane:</p>
<ul>
<li>Prioritize the feature away, delaying the implementation until forever</li>
<li>Negotiate the feature away, by talking to whoever wants it out of it for something in return</li>
<li>Redefine the feature away, by reducing the scope to the few scenarios which absolutely can&#8217;t be ignored</li>
<li>&#8230;</li>
</ul>
<p>Do not drag management into anything you actually want solved. Presented with a question, the manager will answer it by killing the little guy behind the table, so only go to him if you really want that. And once awakened, he might take a lot of sweat to suppress. (If he&#8217;s really a programmer posing as a quasi-manager, the chances for an actual solution can actually be worse: he&#8217;s more likely to feel guilty about his managerial ability and use the opportunity to exercise and develop that ability, instead of using his technical ability to think about the issue.)</p>
<p>***</p>
<p>There&#8217;s this quote from The Mythical Man-Month, supposedly by a pessimistic manager:</p>
<blockquote><p>All programmers are optimists.  Perhaps this modern sorcery especially attracts those who believe in happy endings and fairy god-mothers.  Perhaps the hundreds of nitty frustrations drive away all but those who habitually focus on the end goal. Perhaps it is merely that computers are young, programmers are younger, and the young are always optimists.  But however the selection process works, the result is indisputable: &#8220;This time it will surely run,&#8221; or &#8220;I just found the last bug.&#8221;</p></blockquote>
<p>This is backwards. In reality, programmers are the more pessimistic people. Perhaps it&#8217;s because experience teaches programmers that programs always have bugs while teaching managers that programs always ship. Perhaps it&#8217;s because the programmer is the one with the actual knowledge, and the ignorant are always optimists. But however the selection process works, how many programmers have you seen saying &#8220;it will never work&#8221; and how many managers?</p>
<p>A programmer might be more optimistic locally, hoping in vain to have fixed this one piece of code where he has the illusion of complete understanding. However, it is invariably the manager who believes that <em>everything</em> will work out. A programmer can&#8217;t really believe that because there are so many things nobody even understands that are yet to be faced.</p>
<p>But the manager is used to knowing little and understanding less, and thus has learned to translate uncertainty to optimism. In fact a programmer can learn it, too, in the areas which are of little interest to him. I know a programmer who doesn&#8217;t care about optimization and who consequently describes others&#8217; efforts to fit a program into a given performance budget as doomed to success: &#8220;It runs at the word of command&#8221; - a programmer&#8217;s expression of the managerial worldview worthy of a seasoned manager.</p>
<p>***</p>
<p>We don&#8217;t know how to test for programming ability. The best tech companies spend 5 to 10 interviews to solidly confirm that the candidate knows what is taught during the first 1.5 years of an undergraduate CS curriculum. Other processes measure less accurately by asking less relevant questions; the inaccuracy is somewhat ameliorated by the lack of precision - the non-uniform quirks of interviewers and general randomness of the process eliminate biases, causing all kinds of good candidates to sneak through the gates.</p>
<p>It is well known that we can&#8217;t find out during the interview what we inevitably find out once someone gets the job, but what are the corollaries? Here&#8217;s one I&#8217;ve heard a lot: trust recommendations more than interviews. Here&#8217;s another I haven&#8217;t: let others interview and get the new hires, then steal the best.</p>
<p>(Objection: the first recommendation is good for the company while the other is only good for the manager following it. Well, &#8220;competition between managers over team members isn&#8217;t a zero-sum game - it improves teamwork across the company&#8221;, this one we weasel out of in a snap.)</p>
<p>***</p>
<p>We have a VIP club at work called Bottleneck, its principal activity being the collective purchasing and consumption of alcoholic beverages. The club operates during work hours (regular meetings held on Thursdays, emergency meetings scheduled upon arrival of packages from abroad). Our room being the headquarters, I&#8217;m naturally a member. By now the club has shifted to high-end liquors at prices causing the consumption to contract to a sip per cup of coffee, but originally it was affordable to actually drink.</p>
<p>I noticed that minor alcoholic intoxication has a notable impact on my programming ability. I can still lay my hands on the right variable, but by the time I do I forget what you do next with these things. There&#8217;s that handy member somewhere in it, dot something, but dot what?</p>
<p>However, managerial ability is not affected. Things I can do just fine following a meeting of the Bottleneck club include progress monitoring, planning, risk assessment, general technical advice, and requirement negotiation. Now that I think of it, perhaps the managerial functions are affected for the better.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.yosefk.com/blog/the-virtue-of-a-manager.html/feed</wfw:commentRss>
		</item>
		<item>
		<title>Getting the call stack without a frame pointer</title>
		<link>http://www.yosefk.com/blog/getting-the-call-stack-without-a-frame-pointer.html</link>
		<comments>http://www.yosefk.com/blog/getting-the-call-stack-without-a-frame-pointer.html#comments</comments>
		<pubDate>Fri, 09 Oct 2009 21:33:17 +0000</pubDate>
		<dc:creator>Yossi Kreinin</dc:creator>
		
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.yosefk.com/blog/?p=94</guid>
		<description><![CDATA[Everything I know about getting the current call stack of C or C++ programs, including ones compiled with -fomit-frame-pointer or an equivalent, with or without a debugger.]]></description>
			<content:encoded><![CDATA[<p>Everything I know about getting the current call stack of C or C++ programs, including ones compiled with -fomit-frame-pointer or an equivalent, with or without a debugger. Hardly entertaining except for those specifically wishing to lay their hands on call stacks.</p>
<p>We&#8217;ll start with trivia people sharing my unhealthy interest in call stacks probably know. There are two contexts where you might want to get a call stack:</p>
<ol>
<li>Look at the call stack in a debugger to see what the program is doing.</li>
<li>Get a representation of the call stack inside the program itself. For example, a memory profiler might want to attach the call stack identifying the context where the allocation is made to each memory block to see who allocates the most.</li>
</ol>
<p>Sometimes (2) can be implemented using (1) by running a debugger programmatically and asking it for the current call stack, and sometimes it can&#8217;t (too much communication overhead, or no debugger available - for example, a program stripped of debugging information and running on a standalone embedded board).</p>
<p>The straightforward way of getting the current call stack, used both by debuggers and by programs curious about their own stacks, relies on the frame pointer. The idea is that a machine register is reserved for keeping a pointer into the stack, called the frame pointer, and every function is compiled to do the following in its prologue:</p>
<ul>
<li>Push the return address to the stack</li>
<li>Push the frame pointer to the stack</li>
<li>Save the address of the resulting two-pointer structure to the frame pointer register</li>
</ul>
<p>This creates a linked list on the stack, with every node keeping a return address - this list is the call stack (and this is why debuggers show the <em>points of return</em> from function calls and not the <em>points of call, </em>a bit annoyingly for function calls spanning many source code lines - in fact we get the return stack and not the call stack). Here&#8217;s how you get this list from within a program:</p>
<pre>struct stack_frame {
  struct stack_frame* next;
  void* ret;
};
int get_call_stack(void** retaddrs, int max_size) {
  /* x86/gcc-specific: this tells gcc that the fp
     variable should be an alias to the %ebp register
     which keeps the frame pointer */
  register struct stack_frame* fp asm("ebp");
  /* the rest just walks through the linked list */
  struct stack_frame* frame = fp;
  int i = 0;
  while(frame) {
    if(i &lt; max_size) {
      retaddrs[i++] = frame-&gt;ret;
    }
    frame = frame-&gt;next;
  }
  return i;
}</pre>
<p>The code for getting the list head pointer depends on the platform, the list structure itself is common to many machines and compilers. The return addresses may be converted to function names and source line numbers with <a href="http://linuxcommand.org/man_pages/addr2line1.html">addr2line</a> -f or similar. When the program can&#8217;t access its own debug info during its execution as in the case of embedded devices, the translation from addresses to names will be a separate offline step.</p>
<p>The whole frame pointer business is fairly widely documented, with a bunch of source code available centered around getting the call stack that way. I think the GNU <a href="http://www.gnu.org/software/libc/manual/html_node/Backtraces.html">backtrace</a> function and the Windows <a href="http://msdn.microsoft.com/en-us/library/ms680650(VS.85).aspx">StackWalk64</a> function, which these days are probably a better alternative than code snippets like the one above, also use this linked list when available.</p>
<p>Now, what happens if the compiler is told to avoid generating the code maintaining the frame pointer (-fomit-frame-pointer for gcc, /Oy for MSVC), or not told to override its default behavior of not generating such code (-ga for Green Hills C++)?</p>
<p>Admittedly it doesn&#8217;t appear to be such a good idea to make debugging that much harder in order to save a few instructions. However, there are reasons to do so. For example, one common consequence of Software Design is lots of functions doing little or nothing except for delegating their work to another function. Without frame pointer maintenance, such a call is just a jump instruction - your callee function will return directly to the address saved by your caller. With frame pointers, you need a whole prologue and epilogue here. Anyway, we won&#8217;t discuss the benefits of frame pointer omission since measuring the overhead for your particular code will be more reliable than such a discussion anyway.</p>
<p>Compiling without frame pointers hits code trying to obtain its own context harder than it hits debuggers, because debuggers don&#8217;t really need frame pointers and only (sometimes) rely on them for simplicity of implementation. Given a return address, a debugger can tell:</p>
<ol>
<li>Which function it belongs to (unlike the program itself, a debugger is necessarily supposed to have access to the symbol table)</li>
<li>Where that function keeps the return address (the compiler knows that, so it can tell the debugger)</li>
<li>The amount by which the function decrements the stack pointer, assuming the stack grows downwards - again something the compiler knows. Now that the debugger knows the previous return address and the previous stack pointer, it can go back to step 1.</li>
</ol>
<p>So a debugger can do  just fine without frame pointers as long as the compiler gives it enough information about the layout of the stack. I&#8217;ve been debugging without frame pointers for a long time with the Green Hills MULTI debugger which uses a proprietary debug info format. More recently the DWARF format, gcc and gdb seem to have caught up and now programs compiled with -fasynchronous-unwind-tables -fomit-frame-pointer are debuggable with gdb. The information generated by -fasynchronous-unwind-tables seems to go to a separate ELF section called .eh_frame_hdr.</p>
<p>Not only will gdb use .eh_frame_hdr, but the GNU backtrace function appears to be using it as well (it doesn&#8217;t work under -fomit-frame-pointer but apparently does work when you add -fasynchronous-unwind-tables - <em>although the docs explicitly say:</em> &#8220;frame pointer elimination will stop backtrace from interpreting the stack contents correctly&#8221;). Nor is this section stripped from the program - it&#8217;s not implemented as a &#8220;normal&#8221; debug information section but as an allocated data section, so it&#8217;s always available to a program (in particular, to the backtrace function).</p>
<p>So under gcc, all call stack problems seem to be solved - unless you trust the docs (!?), or unless some code isn&#8217;t compiled with the right flags because of not being up to date or someone being too greedy to allocate space for a debug info section. Outside gcc, or more precisely DWARF, I don&#8217;t think a stripped program can access such debug info.</p>
<p>Is there a way to get a call stack without a frame pointer, without a debugger and without debug info?</p>
<p>For years I was sure that the answer was &#8220;no&#8221;, hence some things will only work under a separate build mode - just like the release build but with frame pointers. Then one time the Green Hills debugger failed to list the call stack for some reason as it sometimes does, but this time we really wanted to decipher it. And we figured that we can in fact do the same thing the debugger does, except we&#8217;ll understand from the assembly code what the debugger usually understood from debug information.</p>
<p>Specifically, to understand where the return address is kept and by what amount the stack pointer is decremented, you need to find the instructions doing (or undoing) these things in the prologue (or the epilogue) of the function. This worked, but due to either inertia or stupidity it took me months to realize that you can write code doing this. Anyway, here&#8217;s how it works on a 32b MIPS processor under the Green Hills compiler. The prologue code of a function will contain instructions like these:</p>
<pre>main+0: 27bdffe8 addiu sp, sp, -0x18
main+4: afbf0014 sw    r31, 0x14(sp)</pre>
<p>The add immediate instruction decrements the stack pointer, and the store word instruction saves the return address from the register where it&#8217;s saved by the caller to some place on the stack. The high 16 bits of these instructions don&#8217;t depend on the function, encoding the &#8220;addui sp, sp&#8221; and the &#8220;sw, r31 &#8230;(sp)&#8221; parts. The low 16 bits encode a signed offset. So we can obtain the call stack from our code disassembling it thusly:</p>
<pre>/* get previous stack pointer and return address
   given the current ones */
int get_prev_sp_ra(void** prev_sp, void** prev_ra,
                   void* sp, void* ra) {
  unsigned* wra = (unsigned*)ra;
  int spofft;
  /* scan towards the beginning of the function -
     addui sp,sp,spofft should be the first command */
  while((*wra &gt;&gt; 16) != 0x27bd) {
    /* test for "scanned too much" elided */
    wra--;
  }
  spofft = ((int)*wra &lt;&lt; 16) &gt;&gt; 16; /* sign-extend */
  *prev_sp = (char*)sp - spofft;
  /* now scan forward for sw r31,raofft(sp) */
  while(wra &lt; (unsigned*)ra) {
    if((*wra &gt;&gt; 16) == 0xafbf) {
      int raofft = ((int)*wra &lt;&lt; 16) &gt;&gt; 16; /* sign */
      *prev_ra = *(void**)((char*)sp + raofft);
      return 1;
    }
    wra++;
  }
  return 0; /* failed to find where ra is saved */
}</pre>
<p>The call stack will then be produced by the following loop:</p>
<pre>int get_call_stack_no_fp(void** retaddrs, int max_size) {
  void* sp = get_sp(); /* stack pointer register */
  void* ra = get_ra(); /* return address register */
  /* adjust sp by the offset by which this function
     has just decremented it */
  int* funcbase = (int*)(int)&amp;get_call_stack_no_fp;
  /* funcbase points to an addiu sp,sp,spofft command */
  int spofft = (*funcbase &lt;&lt; 16) &gt;&gt; 16; /* 16 LSBs */
  int i=0;
  sp = (char*)sp-spofft;
  do {
    if(i &lt; max_size) {
      retaddrs[i++] = ra;
    }
  }
  while(get_prev_sp_ra(&amp;sp,&amp;ra,sp,ra));
  return i; /* stack size */
}</pre>
<p>get_sp and get_ra access registers so they must be assembly macros, which in the case of Green Hills can be spelled like this:</p>
<pre>asm void* get_ra() {
  move $v0, $ra
}
asm void* get_sp() {
  move $v0, $sp
}</pre>
<p>Under MIPS32 and Green Hills, this code seems to be giving decent call stacks except for the inevitable omission of function calls done without saving the return address; the most common case of those - simple delegating functions - was already mentioned above. If f calls g which does (almost) nothing except for calling h, and g doesn&#8217;t bother to save the return address to the stack, having h return directly to f, then the call stack will contain f and h but not g. Not much of a problem and sometimes even an advantage as far as I&#8217;m concerned, since g is rarely interesting. Also, you can get this problem irregardless of the lack of a frame pointer - for example, gcc -O3 on x86 will not maintain the call stack accurately even without -fomit-frame-pointer, generating the following ridiculous code:</p>
<pre>pushl   %ebp
movl    %esp, %ebp
popl    %ebp
; so what was the point of setting up and
; immediately destroying a stack frame?
jmp     print_trace</pre>
<p>Now, although code like get_prev_sp_ra looking for 0&#215;27bd admittedly is a heuristic relying on undocumented platform-specific behavior, it looks like a passable way of getting a call stack on a RISC machine like MIPS, ARM or PowerPC. What about the x86 though? Effectively we have our code partially disassembling itself here, which is not nearly as easy with the x86 (in particular, I don&#8217;t think there&#8217;s a way to scan backwards because of the variable encoding length; although we could look at epilogues instead of prologues just as well).</p>
<p>Instead of dragging in a disassembler, we can use an external program, such as, well, a debugger. This obviously defeats the purpose of being able to get the call stack <em>without</em> a debugger. But this purpose isn&#8217;t very interesting on the x86 in the first place because there you&#8217;re rarely stuck in situations where a program can&#8217;t run a debugger.</p>
<p>The only point of the disassembling business on the x86 thus remains to deal with programs compiled without a frame pointer <em>and</em> without debug information making it possible to get the call stack nonetheless. I don&#8217;t know if anybody has such a problem these days, now that gcc has -fasynchronous-unwind-tables - perhaps someone uses compilers which can&#8217;t do this or binaries compiled without this, and perhaps the problem is extinct on the x86. For what it&#8217;s worth, <a href="http://yosefk.com/sw/bt">here&#8217;s</a> a script getting the call stack from a core file without relying on gdb&#8217;s bt command but relying on its disassemble command. Usage: python bt &lt;program&gt; &lt;core&gt;. No warranty, &#8230;or FITNESS FOR A PARTICULAR PURPOSE.</p>
<p>And this is all I know about getting the call stack in C or C++, something users of other languages can do, in the (unlikely) absence of a library function doing just that, simply by throwing an exception, immediately catching it and using its getStackTrace method or some such.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.yosefk.com/blog/getting-the-call-stack-without-a-frame-pointer.html/feed</wfw:commentRss>
		</item>
		<item>
		<title>What makes cover-up preferable to error handling</title>
		<link>http://www.yosefk.com/blog/what-makes-cover-up-preferable-to-error-handling.html</link>
		<comments>http://www.yosefk.com/blog/what-makes-cover-up-preferable-to-error-handling.html#comments</comments>
		<pubDate>Fri, 02 Oct 2009 23:31:32 +0000</pubDate>
		<dc:creator>Yossi Kreinin</dc:creator>
		
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.yosefk.com/blog/?p=93</guid>
		<description><![CDATA[Fuck you plenty? Very good, sir!]]></description>
			<content:encoded><![CDATA[<p>There was a Forth tutorial which I now fail to find that literally had a &#8220;crash course&#8221; right in the beginning, where you were shown how to crash a Forth interpreter. Not much of a challenge - `echo 0 @ | pforth` does the trick for me - but I liked the way of presentation: &#8220;now we&#8217;ve learned how to crash, no need to return to that in the future&#8221;.</p>
<p>So, let&#8217;s have a Python &amp; Perl crash course - do something illegal and see what happens. We&#8217;ll start with my favorite felony - out-of-bounds array access:</p>
<pre>python -c 'a=(1,2,3); print "5th:",a[5]&#8216;
perl -e &#8216;@a=(1,2,3); print &#8220;5th: $a[5]\n&#8221;&#8216;</pre>
<p>The output:</p>
<pre>5th:
Traceback (most recent call last):
  File "&lt;string&gt;", line 1, in &lt;module&gt;
IndexError: tuple index out of range
5th:</pre>
<p>Python in fact crashed, telling us it didn&#8217;t like the index.</p>
<p>Perl was more kind and rewarded our out-of-bounds index with what looks like the empty string. Being the kind of evildoer who&#8217;s only further provoked by the gentle reactions of a do-gooder, what shall we do to further harass it? Well, it looks like anything makes a good index (and I mean <em>anything</em>: if @a=(11,22), then $a["1"]==22 and $a["xxx"]==11). But perhaps some things don&#8217;t make good arrays.</p>
<pre>python -c 'x=5; print "2nd:",x[2]&#8216;
perl -e &#8216;$x=5; print &#8220;2nd: $x[2]\n&#8221;&#8216;</pre>
<p>Output:</p>
<pre>2nd:
Traceback (most recent call last):
  File "&lt;string&gt;", line 1, in &lt;module&gt;
TypeError: 'int' object is unsubscriptable
2nd:</pre>
<p>Python gives us its familiar elaborate complains, while Perl gives us its familiar laconic empty string. Its kindness and flexibility are such that in a numeric context, it would helpfully give us the number 0 - the empty string is what we get in a string context.</p>
<p>Is there any way to exceed the limits of Perl&#8217;s patience? What about nonsensical operations - I dunno, concatenating a hash/dictionary/map/whatever you call it and a string?</p>
<pre>python -c 'map={1:2,3:4}; print "map+abc:",map+"abc"'
perl -e '%map=(1,2,3,4); print "map+abc: ",%map . "abc\n"'</pre>
<p>Output:</p>
<pre>map+abc:
Traceback (most recent call last):
  File "&lt;string&gt;", line 1, in &lt;module&gt;
TypeError: unsupported operand type(s) for +: 'dict' and 'str'
map+abc: 2/8abc</pre>
<p>Python doesn&#8217;t like our operands but Perl retains its non-judgmental frame of mind. A Perl detractor could point out that silently converting %map to &#8220;2/8&#8243; (hash size/reserved space) is patently insane. A Perl aficionado could point out that Perl seems to be following Python&#8217;s motto &#8220;Explicit is better than implicit&#8221; better than Python itself. In Python you can&#8217;t tell the type of map at the point of usage. Perl code clearly states it&#8217;s a hash with %, moreover . specifically means string concatenation (as opposed to +). So arguably you get what you asked for. Well, the one thing that is not debatable is that we still can&#8217;t crash Perl.</p>
<p>OK, so Perl is happy with indexes which aren&#8217;t and it is happy with arrays which aren&#8217;t, and generally with variables of some type which aren&#8217;t. What about variables that simply aren&#8217;t?</p>
<pre>python -c 'print "y:",y'
perl -e 'print "y: $y\n"'</pre>
<p>Output:</p>
<pre>y:
Traceback (most recent call last):
  File "&lt;string&gt;", line 1, in &lt;module&gt;
NameError: name 'y' is not defined
y:</pre>
<p>NameError vs that hallmark of tolerance, the empty string, a helpful default value for a variable never defined.</p>
<p>By the way, this is how $x[5] evaluates to an empty string when x isn&#8217;t an array, I think. $x[5] is unrelated to the scalar variable $x, it looks for the array variable @x in another namespace. There&#8217;s no @x so you get an empty array, having no 5th element so you get &#8220;&#8221;. I think I understand it all, except for one thing: is there any way at all to disturb the divine serenity of this particular programming language?</p>
<pre>python -c 'a=0; print 1/a'
perl -e '$a=0; print 1/$a'</pre>
<p>This finally manages to produce:</p>
<pre>Traceback (most recent call last):
  File "&lt;string&gt;", line 1, in &lt;module&gt;
ZeroDivisionError: integer division or modulo by zero
Illegal division by zero at -e line 1.</pre>
<p>The second message about &#8220;illegal&#8221; division by zero (is there any other kind?) comes from our no longer tolerant friend, making me wonder. What is so special about division by zero? Why not be consistent with one&#8217;s generally calm demeanor and return something useful like &#8220;&#8221; or 0? Would be perfectly reasonable - I did it myself, more accurately asked to have a hardware divider returning zero in these cases. Because there wasn&#8217;t what hardware calls exception handling (having the processor jump to a handler in the middle of an instruction stream). We lived happily ever after, so what&#8217;s wrong with 0?</p>
<p>But the real question is, what explains the stunning difference between Perl&#8217;s and Python&#8217;s character? Is it philosophical, &#8220;There&#8217;s More Than One Way To Do It (TMTOWTDI)&#8221; vs &#8220;There should be one - and preferably only one - obvious way to do it; Although that way may not be obvious at first unless you&#8217;re Dutch&#8221; (actual Perl and Python mottos, respectively)? The latter approach encourages to classify program behavior as &#8220;erroneous&#8221; where the former tends to instead assume you&#8217;re knowingly doing something clever in Yet Another Possible Way, right?</p>
<p>Modernism vs Postmodernism, maybe, as outlined by Perl&#8217;s author in &#8220;<a href="http://www.wall.org/~larry/pm.html">Perl, the first postmodern computer language</a>&#8220;? &#8220;Perl is like the perfect butler. Whatever you ask Perl to do, it says `Very good, sir,&#8217; or `Very good, madam.&#8217; &#8230; Contrast that with the Modern idea of how a computer should behave. It&#8217;s really rather patronizing: `I&#8217;m sorry Dave. I can&#8217;t allow you to do that.&#8217;&#8221; The latter can be illustrated by Python&#8217;s way of answering the wish of its many users to use braces rather than indentation for scoping:</p>
<pre>&gt;&gt;&gt; from __future__ import braces
SyntaxError: not a chance</pre>
<p>So, &#8220;Very good, sir&#8221; vs &#8220;I can&#8217;t allow you to do that&#8221;. Makes sense with Python vs Perl, but what about, say, Lisp vs C++?</p>
<p>Lisp definitely has a &#8220;There&#8217;s More Than One Way To Do It&#8221; motive in its culture. Look how much control flow facilities it has compared to Python - and on top of that people write flow macros, and generally &#8220;if you don&#8217;t like the way built-in stuff works, you can fix it with macros&#8221;, you know the drill. Guessing the user&#8217;s intent in ambiguous situations? Perl <a href="http://perldoc.perl.org/perlop.html#Gory-details-of-parsing-quoted-constructs">says</a> that it &#8220;uses the DWIM (that&#8217;s &#8220;Do What I Mean&#8221;) principle&#8221; for parsing, <a href="http://pdp-10.trailing-edge.com/decuslib20-01/01/decus/20-0004/17lisp.tty">borrowing</a> a term from Lisp environments. And yet:</p>
<pre>(print (let ((x (make-array 3))) (aref x 5)))
*** - AREF: index 5 for #(NIL NIL NIL) is out of range
(print (let ((x 5)) (aref x 2)))
*** - AREF: argument 5 is not an array
(print (let ((m (make-hash-table))) (concatenate 'string m "def")))
*** - CONCATENATE: #S(HASH-TABLE :TEST FASTHASH-EQL) is not a SEQUENCE
(print y)
*** - EVAL: variable Y has no value
(print (/ 1 0))
*** - division by zero</pre>
<p>5 out of 5, just like Python. Contrast that with C++ which definitely has a Bondage and Discipline culture, what with all the lengthy compiler error messages. Actually C++ would score 4 out of 5 on this test, but the test is a poor fit for statically typed languages. A more appropriate way to evaluate C++&#8217;s error handling approach would be to focus on errors only detectable at run time. The following message from <a href="http://web.mit.edu/~simsong/www/ugh.pdf">The UNIX-HATERS Handbook</a> records the reaction of a lisper upon his encounter with this approach:</p>
<blockquote><p>Date: Mon, 8 Apr 91 11:29:56 PDT<br />
From: Daniel Weise<br />
To: UNIX-HATERS<br />
Subject: From their cradle to our grave.</p>
<p>One reason why Unix programs are so fragile and unrobust is that C<br />
coders are trained from infancy to make them that way. For example,<br />
one of the first complete programs in Stroustrup’s C++ book (the<br />
one after the “hello world” program, which, by the way, compiles<br />
into a 300K image), is a program that performs inch-to-centimeter<br />
and centimeter-to-inch conversion. The user indicates the unit of the<br />
input by appending “i” for inches and “c” for centimeters. Here is<br />
the outline of the program, written in true Unix and C style:</p>
<pre>#include &lt;stream.h&gt;

main() {
  [declarations]
  cin &gt;&gt; x &gt;&gt; ch;
    ;; A design abortion.
    ;; This reads x, then reads ch.
  if (ch == &#8216;i&#8217;) [handle "i" case]
  else if (ch == &#8216;c&#8217;) [handle "c" case]
  else in = cm = 0;
    ;; That’s right, don’t report an error.
    ;; Just do something arbitrary.
[perform conversion] }</pre>
<p>Thirteen pages later (page 31), an example is given that implements<br />
arrays with indexes that range from n to m, instead of the usual 0 to<br />
m. If the programmer gives an invalid index, the program just<br />
blithely returns the first element of the array. Unix brain death forever!</p></blockquote>
<p>You could say that the sarcasm in the Lisp-style comments proudly intermixed with C++ code is uncalled for since example programs are just that - example programs. As to the dreaded out-of-bound array access cited in the second example, well, C++ doesn&#8217;t handle that to avoid run time overhead.</p>
<p>But the cited program didn&#8217;t just ignore the range problem the way C would - it went to the trouble of checking the index and then helpfully returned the 0th element the way Perl would. Probably as one part of its illustration how in C++ you could have custom array types which aren&#8217;t like C arrays. But why Postmodern Perl arrays, in a generally Disciplined language?</p>
<p>Well, it was 1991 and C++ exceptions were very young, likely younger than the cited example programs. (They&#8217;ve since aged but didn&#8217;t improve to the point where most library users would be happy to have to catch them, hence many library writers aren&#8217;t throwing them.)</p>
<p>Likewise, Perl had all the features used in the examples above before it had exceptions - or more accurately before it had them under the spotlight, if I understand this correctly. (In Perl you handle exceptions by putting code in an eval { &#8230; } block and then calls to the die() function jump to the end of that block, saving die&#8217;s argument to $@ - instead of, well, dying. I think Perl had this relatively early; however it seems to only have become idiomatic after Perl 5&#8217;s OO support and an option to send exception objects to $@, with people using die strictly for dying prior to that.) Perhaps Perl&#8217;s helpful interpretations of obvious nonsense like $a["xxx"] aren&#8217;t that helpful after all, but what would you rather have it do - die()?</p>
<p>AFAIK Python had exceptions under the spotlight from the beginning - although similarly to Perl it had exception strings before it had exception classes. And in fact it does its best to adhere to its &#8220;Errors should never pass silently&#8221; philosophy, the few deviations coming to mind having to do with Boolean contexts - the falsehood of empty strings, None and 0 together with None!=False/0==False/1==True/2!=True and similar gateways to <a href="http://www-formal.stanford.edu/jmc/history/lisp/node3.html">pornographic programming</a>.</p>
<p>Lisp has conditions and restarts which blow exceptions out of the water, hence its willingness to report errors isn&#8217;t surprising. However, it gained these features in the 80s; what did previous dialects do? ERRORSET, which is similar to a try/catch block, appears to predate the current error handling system, but it doesn&#8217;t seem to have been there from the very beginning, either. I&#8217;m not familiar with the Lisp fossil record, but there&#8217;s a function for indexing lists called NTH which returns NIL given an out-of-bounds index. Lists definitely predate arrays, so I assume NTH likewise predates AREF which complains given a bad index. Perhaps NTH doesn&#8217;t complain about bad indexes since it also predates ERRORSET and any other form of exception handling?</p>
<p>The pattern seems to be: if the language has exceptions, most of its features and libraries handle errors. If it doesn&#8217;t, they don&#8217;t; errors are covered up.</p>
<p>(Although I won&#8217;t be surprised if I&#8217;m wrong about the Lisp history part because Lisp is generally much more thoughtful than I am  - just look at all the trouble <a href="http://pdp-10.trailing-edge.com/decuslib20-01/01/decus/20-0004/16lisp.tty">Interlisp</a>, by now an ancient dialect, went into in order to figure out whether a user wants to get an opportunity to fix an error manually or would rather have the program silently return to the topmost ERRORSET.)</p>
<p>awk and early PHP lack exceptions and are happy with out-of-bound array access. Java and Ruby have exceptions and you&#8217;ll get one upon such access. It isn&#8217;t just the culture. Or is it? Perl is PHP&#8217;s father and awk is PHP&#8217;s grandfather. *sh and make, which, like Perl, produce an empty string from $nosuchvar, aren&#8217;t good examples, either - sh is Perl&#8217;s mother and make is Perl&#8217;s sister. Is it really the Unix lineage that is at fault as suggested by a dated message to a late mailing list?</p>
<p>Here&#8217;s JavaScript, an offspring of Lisp:</p>
<pre>&gt;&gt;&gt; [1,2,3][5]+5
NaN
&gt;&gt;&gt; [1,2,3][5]+&#8221;abc&#8221;
&#8220;undefinedabc&#8221;</pre>
<p>I think this definitely rivals Perl. Apparently it&#8217;s not the lineage that is the problem - and JavaScript didn&#8217;t have exceptions during its first 2 years.</p>
<p>The thing is, errors are exceedingly gnarly to handle without exceptions. Unless you know what to do at the point where the error is detected, and you almost never know what to do at the point where the error is detected, you need to propagate a description of the error up the call chain. The higher a function sits up the call chain, the more kinds of errors it will have to propagate upwards.</p>
<p>(With a C++ background it can look like the big problem doesn&#8217;t come from function calls but from operators and expressions which syntactically have nowhere to send their complaints. But a dynamic language could have those expressions evaluate to a special Error object just as easily as it can produce &#8220;&#8221; or &#8220;undefined&#8221;. What this wouldn&#8217;t solve is the need to clutter control flow somewhere down the road when deciding which control path to take or what should go to files or windows, now that every variable can have the value Error tainting all computations in its path.)</p>
<p>Different errors carry different meta-data with them - propagating error codes alone ought to be punishable by death. What good is a &#8220;no such file&#8221; error if I don&#8217;t know which file it is? What good is a &#8220;network layer malfunction&#8221; error if I can&#8217;t tell that in fact it&#8217;s a &#8220;no such file&#8221; error at the network layer because a higher level lumps all error codes from a lower level into one code? (It always lumps them together since the layers have different lifecycles and the higher level can&#8217;t be bothered to update its error code list every time the lower level does.)</p>
<p>Different meta-data means, importantly for static languages, different types of output objects from callee functions depending on the error, which can only be handled in an ugly fashion. But even if there&#8217;s no such problem, you have to test every function call for errors. If a function didn&#8217;t originally return error information but now does, you have to change all points of call.</p>
<p>Nobody ever does it.</p>
<p>It is to me an upper bound that apparently all programming languages without exception handling cover up errors at the language level.</p>
<p>A designer of a successful programming language, whatever you think of that language, is definitely an above average programmer, actually I think one that can safely be assumed to occupy the top tenth no matter how you rate programming ability (and really, how do you?). Moreover, the language designer is also in the top tenth if programmers are sorted by the extent of importance they assign to their work and motivation to get things right - because the problems are fundamental, because of the ego trip, because of everything.</p>
<p>And still, once decision or chance lead them into a situation where the language has no exceptions, they allow themselves to have errors covered up throughout their cherished system. Despite the obvious pain it causes them, as evident from the need for rationalizations ranging from runtime overhead (as if you couldn&#8217;t have an explicit unsafe subset for that - see C#) to the Postmodern Butler argument (&#8221;Fuck you plenty? Very good, sir!&#8221;).</p>
<p>What does this leave us to expect from average programs? The ones written in the order of execution, top to bottom, with the same intellectual effort that goes into driving from point A to point B? The ones not exposed to the kind of massive testing/review that could exterminate bugs in a flophouse?</p>
<p>This is why nothing inspires my trust in a program like a tendency to occasionally wet its pants with an error call stack. Leaving some exceptions uncaught proves the programmers are somewhat sloppy but I wouldn&#8217;t guess otherwise anyway. What&#8217;s more important is that because of exceptions along the road, the program most likely won&#8217;t make it to the code where shooting me in the foot is implemented. And if a program never spills its guts this way, especially if it&#8217;s the kind of program with lots of I/O going on, I&#8217;m certain that under the hood, it&#8217;s busy silencing desperate screams: where there are no exceptions, there must be cover-up.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.yosefk.com/blog/what-makes-cover-up-preferable-to-error-handling.html/feed</wfw:commentRss>
		</item>
		<item>
		<title>The C++ Sucks Series: petrifying functions</title>
		<link>http://www.yosefk.com/blog/the-c-sucks-series-petrifying-functions.html</link>
		<comments>http://www.yosefk.com/blog/the-c-sucks-series-petrifying-functions.html#comments</comments>
		<pubDate>Fri, 04 Sep 2009 21:13:54 +0000</pubDate>
		<dc:creator>Yossi Kreinin</dc:creator>
		
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.yosefk.com/blog/?p=91</guid>
		<description><![CDATA[I'll argue that you can't get a first-rate user experience with this high-level object-oriented programming language if you don't also know assembly. And with the first-rate experience being the living hell that it is, few would willingly opt for a second-rate option.]]></description>
			<content:encoded><![CDATA[<p>Your IT department intends to upgrade your OS and gives a developer a new image to play with. The developer is generally satisfied, except there&#8217;s this one program that mysteriously dumps core. Someone thoughtfully blames differences in system libraries.</p>
<p>Alternative prelude: you have this program and you&#8217;re working on a new version. Being generally satisfied with the updates, you send the code overseas. They build it and it mysteriously dumps core. Someone thoughtfully blames differences in the compiler version.</p>
<p>Whatever the prelude, you open the core dump with `gdb app core` and gdb says:</p>
<pre>#0  0x080484c9 in main (argc=Cannot access memory at address 0xbf3e7a8c) at main.cpp:4
4    int main(int argc, char** argv)
(gdb)</pre>
<p>Check out the garbage near &#8220;argc=&#8221; - if it ain&#8217;t printing garbage, it ain&#8217;t a C++ debugger. Anyway, it looks like the program didn&#8217;t even enter main. An alert C++ hater will immediately suspect that the <a href="http://www.yosefk.com/blog/the-c-sucks-series-the-quest-for-the-entry-point.html">flying circus</a> happening in C++ before main could be at fault, but in this case it isn&#8217;t. In fact, a program can be similarly petrified by the perspective of entering any function, not necessarily main. It&#8217;s main where it crashes in our example because the example is small; here&#8217;s the source code:</p>
<pre>#include &lt;stdio.h&gt;
#include "app.h"

int main(int argc, char** argv)
{
  if(argc != 2) {
    printf("please specify a profile\n");
    return 1;
  }
  const char* profile = argv[1];
  Application app(profile);
  app.mainLoop();
}</pre>
<p>On your machine, you run the program without any arguments and sure enough, it says &#8220;please specify a profile&#8221;; on this other machine, it just dumps core. Hmmm.</p>
<p>Now, I won&#8217;t argue that C++ isn&#8217;t a high-level object-oriented programming language since every book on the subject is careful to point out the opposite. Instead I&#8217;ll argue that you can&#8217;t get a <em>first-rate</em> user experience with this high-level object-oriented programming language if you don&#8217;t also know assembly. And with the first-rate experience being the living hell that it is, few would willingly opt for a second-rate option.</p>
<p>For example, nothing at the source code level can explain how a program is so shocked by the necessity of running main that it dumps a core in its pants. On the other hand, here&#8217;s what we get at the assembly level:</p>
<pre>(gdb) p $pc
$1 = (void (*)(void)) 0x80484c9 &lt;main+20&gt;
(gdb) disass $pc
Dump of assembler code for function main:
0x080484b5 &lt;main+0&gt;:    lea    0x4(%esp),%ecx
0x080484b9 &lt;main+4&gt;:    and    $0xfffffff0,%esp
0x080484bc &lt;main+7&gt;:    pushl  -0x4(%ecx)
0x080484bf &lt;main+10&gt;:    push   %ebp
0x080484c0 &lt;main+11&gt;:    mov    %esp,%ebp
0x080484c2 &lt;main+13&gt;:    push   %ecx
0x080484c3 &lt;main+14&gt;:    sub    $0xa00024,%esp
0x080484c9 &lt;main+20&gt;:    mov    %ecx,-0xa0001c(%ebp)
# we don't care about code past $pc -
# a screenful of assembly elided</pre>
<p>What this says is that the offending instruction is at the address main+20. As you&#8217;d expect with a Segmentation fault or a Bus error core dump, this points to an instruction accessing memory, specifically, the stack.</p>
<p>BTW I don&#8217;t realy know the x86 assembly, but I can still read it thusly: &#8220;mov&#8221; can&#8217;t just mean the tame RISC &#8220;move between registers&#8221; thing because then we wouldn&#8217;t crash, so one operand must spell a memory address. Without remembering the source/destination order of the GNU assembler (which AFAIK is the opposite of the usual), I can tell that it&#8217;s the second operand that is the memory operand because there&#8217;s an integer constant which must mean an offset or something, and why would you need a constant to specify a register operand. Furthermore, I happen to remember that %ebp is the frame pointer register which means that it points into the stack, however I could figure it out from a previous instruction at main+11, which moves %esp [ought to be the <strong>s</strong>tack <strong>p</strong>ointer] to %ebp (or vice versa, as you could think without knowing the GNU operand ordering - but it would <em>still</em> mean that %ebp points into the stack.)</p>
<p>Which goes to show that you can read assembly while operating from a knowledge base that is not very dense, <a href="http://www.bitmover.com/lm/quotes.html">a way</a> of saying &#8220;without really knowing what you&#8217;re doing&#8221; - try that with C++ library code; but I digress. Now, why would we fail to access the stack? Could it have to do with the fact that we apparenty access it with the offset -0xa0001c, which ought to be unusually large? Let&#8217;s have a look at the local variables, hoping that we can figure out the size of the stack main needs from their sizes. (Of course if the function used a Matrix class of the sort where the matrix is kept by value right there in a flat member array, looking at the named local variables mentioned in the program wouldn&#8217;t be enough since the temporaries returned by overloaded operators would also have to be taken into account; luckily this isn&#8217;t the case.)</p>
<pre>(gdb) info locals
# if it ain't printing garbage, it ain't a C++ debugger:
profile = 0xb7fd9870 "U\211?WVS??\207"
app = Cannot access memory at address 0xbf3e7a98</pre>
<p>We got two local variables; at least one must be huge then. (It can be worse in real life, main functions being perhaps the worst offenders, as many people are too arrogant to start with an Application class. Instead they have an InputParser and an OutputProducer and a Processor, which they proudly use in a neat 5-line main function - why wrap that in a <em>class</em>, 2 files in C++-land? Then they add an InputValidator, an OutputFormatConfigurator and a ProfileLoader, then less sophisticated people gradually add 20 to 100 locals for doing things right there in main, and then nobody wants to refactor the mess because of all the local variables you&#8217;d have to pass around; whereas an Application class with two hundred members, while disgusting, at least makes helper functions easy. But I digress again.)</p>
<pre>(gdb) p sizeof profile
$2 = 4
(gdb) p sizeof app
$3 = 10485768</pre>
<p>&#8220;10485768&#8243;. The trouble with C++ debuggers is that they routinely print so much garbage due to memory corruption, debug information inadequacy and plain stupidity that their users are accustomed to automatically ignore most of their output without giving it much thought. In particular, large numbers with no apparent regularity in their digits are to a C++ programmer what &#8220;viagra&#8221; is to a spam filter: a sure clue that something was overwritten somewhere and the number shouldn&#8217;t be trusted (I rarely do pair programming but I do lots of pair debugging and people explicitly shared this spam filtering heuristic with me).</p>
<p>However, in this case overwriting is unlikely since a sizeof is a compile time constant stored in the debug information and not in the program memory. We can see that the number will &#8220;make more sense&#8221; in hexadecimal (which is why hex is generally a good thing to look at before ignoring &#8220;garbage&#8221;):</p>
<pre>(gdb) p /x sizeof app
$4 = 0xa00008</pre>
<p>&#8230;Which is similar to our offset value, and confirms that we&#8217;ve been debugging a plain and simple stack overflow. Which would be easy to see in the case of a recursive function, or if the program crashed, say, in an attempt to access a large local array. However, in C++ it will crash near the beginning of a function long before the offending local variable is even declared, in an attempt to push the frame pointer or some such; I think I also saw it crash in naively-looking places further down the road, but I can&#8217;t reproduce it.</p>
<p>Now we must find out which member of the Application class is the huge one, which is lots of fun when members are plentiful and deeply nested, which, with a typical Application class, they are. Some languages have reflection given which we could traverse the member tree automatically; incidentally, most of those languages don&#8217;t dump core though. Anyway, in our case finding the problem is easy because I&#8217;ve made the example small.</p>
<p>(I also tried to make it ridiculous - do you tend to ridicule pedestrian code, including your own, sometimes as you type? Few do and the scarcity makes them very dear to me.)</p>
<pre>class Application
{
 public:
  Application(const char* profile);
  void mainLoop();
 private:
  static const int MAX_BUF_SIZE = 1024;
  static const int MAX_PROF = 1024*10;
  const char* _profPath;
  char _parseBuf[MAX_BUF_SIZE][MAX_PROF];
  Profile* _profile;
};</pre>
<p>This shows that it&#8217;s _parseBuf that&#8217;s causing the problem. This also answers the question of an alert C++ apologist regarding all of the above not being special to C++ but also relevant to C (when faced with a usability problem, C++ apologists like to ignore it and instead concentrate on assigning blame; if a problem reproduces in C, it&#8217;s not C++&#8217;s fault according to their warped value systems.) Well, while one could write an equivalent C code causing a similar problem, one is unlikely to do so because C doesn&#8217;t have a private keyword which to a first approximation does nothing but is advertised as an &#8220;<a href="http://yosefk.com/c++fqa/defective.html#defect-1">encapsulation mechanism</a>&#8220;.</p>
<p>In other words, an average C programmer would have a createApplication function which would malloc an Application struct and all would be well since the huge _parseBuf wouldn&#8217;t land on the stack. Of course an average C++ programmer, assuming he found someone to decipher the core dump for him as opposed to giving up on the OS upgrade or the overseas code upgrade, could also allocate the Application class dynamically, which would force him to change an unknown number of lines in the client code. Or he could change _parseBuf&#8217;s type to std::vector, which would force him to change an unknown number of lines in the implementation code, depending on the nesting of function calls from Application. Alternatively the average C++ programmer could change _parseBuf to be a reference, new it in the constructor(s) and delete it in the destructor, assuming he can find someone who explains to him how to declare references to 2D arrays.</p>
<p>However, suppose you don&#8217;t want to change code but instead would like to make old code run on the new machine - a perfectly legitimate desire independently of the quality of the code and its source language. The way to do it under Linux/tcsh is:</p>
<pre>unlimit stacksize</pre>
<p>Once this is done, the program should no longer dump core. `limit stacksize` would show you the original limit, which AFAIK will differ across Linux installations and sometimes will depend on the user (say, if you ssh to someone&#8217;s desktop, you can get a lesser default stacksize limit and won&#8217;t be able to run the wretched program). For example, on my <a href="http://wubi-installer.org/">wubi</a> installation (Ubuntu for technophopes like myself who have a Windows machine, want a Linux, and hate the idea of fiddling with partitions), `limit stacksize` reports the value of 8M.</p>
<p>Which, as we&#8217;ve just seen, is tiny.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.yosefk.com/blog/the-c-sucks-series-petrifying-functions.html/feed</wfw:commentRss>
		</item>
		<item>
		<title>Coding standards: having more errors in code than code</title>
		<link>http://www.yosefk.com/blog/coding-standards-having-more-errors-in-code-than-code.html</link>
		<comments>http://www.yosefk.com/blog/coding-standards-having-more-errors-in-code-than-code.html#comments</comments>
		<pubDate>Thu, 20 Aug 2009 19:24:11 +0000</pubDate>
		<dc:creator>Yossi Kreinin</dc:creator>
		
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.yosefk.com/blog/?p=90</guid>
		<description><![CDATA[I ran LINT version 9, configured to report the violations of the rules in the MISRA C++ 2008 coding standard, on a C++ source file. LINT is perhaps the most famous tool for statically checking C and C++ source code. MISRA stands for the Motor Industry Software Reliability Association, mandating adherence to its coding standards [...]]]></description>
			<content:encoded><![CDATA[<p>I ran <a href="http://www.gimpel.com/">LINT</a> version 9, configured to report the violations of the rules in the <a href="http://www.misra-cpp.org/">MISRA C++ 2008</a> coding standard, on a C++ source file. LINT is perhaps the most famous tool for statically checking C and C++ source code. MISRA stands for the Motor Industry Software Reliability Association, mandating adherence to its coding standards throughout the automotive industry.</p>
<p>The source file I tried has several KLOC worth of code, and the output of the preprocessor takes about 1M - pretty normal for C++ where a &#8220;Hello, world!&#8221; program generates 3/4M of preprocessed output. The output of LINT takes 38M. That&#8217;s 38x more errors than code.</p>
<p>We&#8217;re not finished parsing this output so I&#8217;m not sure which rules cause most violations and whether they can be clustered somehow to compress the 38M into something resembling comprehensible narrative in contents and size. The only thing basic attempts at parsing revealed at this point is that the distribution of the violations is roughly geometric, with the majority of the errors reporting violations of a minority of the rules.</p>
<p>Therefore, my only way of conveying some insight into the MISRA rules enforced by LINT is to look at a toy example. My example will be a Hello, world program - 2 LOC or 3/4M worth of code depending on your perspective. I&#8217;ll assume LINT is told to ignore standard libraries, so it will actually be closer to 2 LOC.<br />
<code><br />
#include &lt;iostream&gt;<br />
int main() { std::cout &lt;&lt; "Hello, world" &lt;&lt; std::endl; }</code></p>
<p>From this program, LINT will produce 4 error messages when configured to enforce MISRA C++ 2008:</p>
<ol>
<li>The &#8220;int&#8221; in &#8220;int main&#8221; violates an advisory rule to avoid using built-in types and instead use typedefs indicating the size and signedness of the type, such as int32_t, INT or signed32T. Many an automotive project use a mixture of 2 or 3 of these conventions, which is compliant with the MISRA guidelines and presumably results from the history of merging or integrating code bases and/or teams. (I believe that in the particular case of main, the C and C++ standards both mandate the use of int; I didn&#8217;t check if you can use a typedef to spell int but I&#8217;m certain that you can&#8217;t have main() return an int32_t on a platform where int is 16b. Anyway, it appears that LINT doesn&#8217;t bother to special-case main() - but you can do that yourself in its configuration file or right there in the source code, as you will have to do in many other cases.)</li>
<li>The first left shift operator violates a MISRA rule disallowing the use of bitwise shift on signed types, or so it does according to LINT, which presumably checks whether the operands are of an unsigned integral type and reports an error if they are not (the other option is that it figures an output stream or a literal character array are &#8220;signed&#8221;, but I can&#8217;t see how they can be unless it&#8217;s a signature we&#8217;re talking about rather than signedness). The MISRA rule is based on the fact that the behavior of bitwise shift is implementation-defined and thus not portable. I do believe that there does not exist a 32b machine which does not use the 2&#8217;s complement representation for integers and is a target of an automotive application. A notable share of automotive applications use signed integers to represent fixed point numbers, and I believe all of them rely on the 2&#8217;s complement semantics of bitwise shifts to emulate multiplication and division.</li>
<li>The second left shift operator is reported as violating the same rule.</li>
<li>The two left shift operators as a whole are reported to violate the rule disallowing dependence on C operator precedence. That is, in order to correctly understand this program, a reader would have to know that (std::cout &lt;&lt; &#8220;Hello, world!&#8221;) would be evaluated first and then its output would be shifted to the left by std::endl. MISRA strives to prevent confusion, based on a well-founded assumption that few programmers know the rules of operator precedence and evaluation order, and LINT enforces the rules defined based on these premises.</li>
</ol>
<p>I hope this gives some insight on the general code/errors ratio.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.yosefk.com/blog/coding-standards-having-more-errors-in-code-than-code.html/feed</wfw:commentRss>
		</item>
		<item>
		<title>The nomadic programmer</title>
		<link>http://www.yosefk.com/blog/the-nomadic-programmer.html</link>
		<comments>http://www.yosefk.com/blog/the-nomadic-programmer.html#comments</comments>
		<pubDate>Thu, 06 Aug 2009 21:38:22 +0000</pubDate>
		<dc:creator>Yossi Kreinin</dc:creator>
		
		<category><![CDATA[wetware]]></category>

		<guid isPermaLink="false">http://www.yosefk.com/blog/?p=89</guid>
		<description><![CDATA[Perhaps there are ways to improve the quality of life in Siberia, however this is not nearly as important politically as the fact that there's already a guy exclusively responsible for the quality of life in Siberia.]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s this broad metaphor I have, with no conclusions attached - just an attempt to describe dynamics. I&#8217;ve recently shared it with <a title="...also a programmer. The blog is in Hebrew." href="http://morehhadash.blogspot.com/">the commie ex-VP school teacher</a> and he liked it, so I thought it could fit well with the other borderline stuff I host at Proper Fixation under the &#8220;wetware&#8221; category.</p>
<p>So. One recurring theme in the history of civilization is the conflict between nomadic and settled people. Nomads think that land is for feeding cattle and you move elsewhere once there&#8217;s nothing left to graze. Villagers figure that land is for growing food, so you settle on it and fertilize it and irrigate it and stuff. Initially, nomads typically dominate the landscape, periodically attacking the settled villagers and taking their crops. However, the settled people eventually accumulate enough surplus to support cities, nation states and standing armies, extending their control to more and more lands and eventually exterminating the nomadic lifestyle altogether.</p>
<p>The way I painted this picture, I tend to side with the hard-working settled folk, the nomads being the parasitic losers I&#8217;ve depicted, and I think most of us civilized humans share similar sentiments. However, in my metaphor I side with the nomadic programmer, at least to a large extent, and I do so because of the meaning my metaphor assigns to &#8220;land&#8221;.</p>
<p>The thing I find analogous to land in programming is <em>problems</em>, because that&#8217;s where programmers live. Programmers live on (in?) problems in the sense of dealing with broken things most of the time - once something starts working, you move on to something that doesn&#8217;t. In another sense, large problems or problem areas a programmer deals with define that programmer&#8217;s territory. The programmer is in immediate demand to the extent that solutions to &#8220;his&#8221; problems are in demand; problems feed programmers. Strong programmers seek, in one way or another, to expand their responsibility to encompass more problems, and to preserve their existing responsibilities. And so on.</p>
<p>Now if we restate the respective worldviews of nomads and settlers in the terms of this metaphor, we&#8217;ll get this. Nomads think that problems exist for solving them and you move elsewhere once there&#8217;s nothing left to graze. Settlers think that problems exist for <em>growing</em> them, so they settle on them and fertilize them and irrigate them and stuff.</p>
<p>And now you can see why I&#8217;m inclined to sympathize with the nomadic programmer. Two other things fueling this sympathy are issues of personality to be discussed soon, and the fate of the nomad to be discussed immediately. And while the nomad is no longer the parasite, rest assured that he&#8217;s still, in the long run, the loser.</p>
<p>Initially - in a young and small organization - nomadic programmers tend to dominate the landscape. There are more problems than people around. The nomadic programmer travels from one urgent problem to another, grazing through them as fast as he can. Occasionally he stumbles upon a settler who has settled on a problem near the nomad&#8217;s territory and grown crops of code there. Well, if the problem occupied by the settler becomes urgent, or if the crops stand in the way of solving the nomad&#8217;s adjacent urgent problem, the nomad will go ahead and brutally solve the settler&#8217;s problem, wiping out his crops. The politics of the invasion will be trivial - a promise to deliver by the nomad carries lots of weight at this stage and the settler will not issue a counter-promise (to deliver in his own way) because he&#8217;s a peaceful code-growing villager who isn&#8217;t into stress which necessarily comes with delivering quickly.</p>
<p>However, the time goes by and sure enough, the settled people accumulate quite some surplus. What you grow on land is surplus wheat; what you grow on problems is surplus code. Code that wouldn&#8217;t naturally grow on a problem - but now that the problem was fertilized by the original settlers, they&#8217;ve grown enough code on it to support whole cities, a nation state, and a standing army of programmers, all making a living by fiddling with this code.</p>
<p>The nomad starts running out of pasture. Sure enough, there are lots of problems just like there used to be. But you can no longer solve them because (1) now it&#8217;s the majority and not a minority of problems that are already owned by someone (growing them rather than solving them) and (2) in most cases invasion is no longer an option. Now that the problem is owned by a nation state, responsible for lots of code and with lots of people working on that code, the nomad&#8217;s promise to deliver quickly carries very little weight compared to the danger of irritating the sovereign. While it is quite likely <em>still </em>true that a nomad will probably deliver more quickly than the whole nation state team, the nomad will not be able to take over the entire responsibility of the team. (It is possible that the single reason for the latter is the problems grown by the team itself and that a few nomads could in fact handle the <em>original</em> problem. But it is irrelevant since problems that could have been avoided are no less real than problems that couldn&#8217;t.)</p>
<p>So if the organization, by some decision making mechanism, lets the nomad invade the territory of the settled team and <em>solve</em> the stupid problem, and then the offended team, by some decision making mechanism, fights back by effectively going on strike, there is nothing the nomad will be able to offer the organization at this point. Of course it doesn&#8217;t have to come to this, just like political conflicts don&#8217;t have to come to full-scale wars, or personal conflicts to fist fights or court hearings. It&#8217;s enough for the worst case scenario to be likely to work out in favor of A rather than B to shift the balance decisively in favor of A. Even if neither A nor B nor anyone making decisions affecting A and B actually think in terms of this scenario, things tend to evolve and adapt such that decisions are made in favor of A. And in our case, the nomadic programmer is B.</p>
<p>Solving problems just isn&#8217;t the big thing in this organization anymore, just like the quality of life experienced by the inhabitants of some territory isn&#8217;t the main theme in international politics. Perhaps there are ways to improve the quality of life in Siberia, however this is not nearly as important politically as the fact that there&#8217;s already <a title="That's the guy at the time of writing." href="http://en.wikipedia.org/wiki/Vladimir_Putin">a guy</a> exclusively responsible for the quality of life in Siberia. Perhaps <a href="http://en.wikipedia.org/wiki/Socialism_with_Chinese_characteristics">Socialism with Chinese Characteristics</a> could yield improvements in the lives of Siberians that <a href="http://www.carnegieendowment.org/events/index.cfm?fa=eventDetail&amp;id=819">Managed Democracy</a> could not, however, if the Chinese try to act on this assumption, there will be a nuclear war. If what remains of a nomadic tribe somewhere in the region makes a similar attempt, then it will remain no more.</p>
<p>The disgruntled nomadic programmer reduces his ambition to merely being left alone to wander the remaining wilderness. However, this option is no more real for him now than the option of being left alone was available to the settler in the old days. Back then, the settlers were never safe since a nomad could always bump into them in an attempt to solve a related problem, and if their stuff got in the way, he&#8217;d rewrite or delete/disable their stuff. Now it is the nomad who is never safe since the nation states keep expanding their responsibilities into neighboring problems - having enough people to have some of them free for that some of the time.</p>
<p>(Actually having even partially idle workers on a team leaves few satisfying alternatives to an attempt at expanding the team&#8217;s responsibilities since other teams are always happy to seize an idle worker. Likewise, back in the old days the nomadic programmer had few satisfying alternatives to invading and solving others&#8217; problems since otherwise he couldn&#8217;t keep his promises to deliver. It&#8217;s not (just) the intentions that fuel wars, it&#8217;s (also) the situation.)</p>
<p>The nation states seeking to expand won&#8217;t fight each other since the nomad is a much easier target, not having resources (time and reports) to look over his entire territory. Once a nation state team managed to take over some of that ever-shrinking territory, the nomad will never gain it back. Increasingly, the nomad has to reach compromises with neighboring nation states whenever his work is related to their work. Then it turns out that in order to be able to work on what he wants at all, he has to do it the way a chief commander or an officer of a nation state team wants him to do it - and then that in order to <em>work</em> on anything at all, he has to report to such a manager.</p>
<p>At this point the nomadic programmer can use his reputation and seniority to get pseudo-promoted to a non-productive position. Alternatively, he can actually become a report of a nation state team manager with whom the relationship is likely already strained - and his seniority, reputation and ambitions won&#8217;t make the transition into this particular position of a report any smoother. Alternatively he can quit. His failure is now complete.</p>
<p>(It may sound like a natural thing for a nomad to change jobs fairly frequently - part of a lifestyle rather than the failure of that lifestyle. However, nomadic programmers are those who like to travel from problem to problem - not from job to job; some like the latter but some don&#8217;t. A new job at a new place means a temporary, but possibly significant loss of confidence and efficiency. An African nomad won&#8217;t necessarily welcome a relocation to Alaska.)</p>
<p>As I&#8217;ve said above, I have reasons having to do with my personality to side with the nomadic programmer, especially at the stage of mounting pressure from nation state teams. The people I tend to relate to most easily seem to be those who prefer freedom to power. A talented freedom-seeker with a strong sense of responsibility will accumulate, well, responsibilities much more quickly than reports - a lot of territory to wander, and no standing army to protect it. (The problem with reports is that you take their freedom by telling them what to do and they take your freedom through your responsibility for their actions; who wants reports?) Since many freedom-lovers disdain politics, they won&#8217;t respect international borders - a problem should be solved, dammit; hence they&#8217;re likely to initiate invasions.</p>
<p>However, while this means that I personally will tend to find myself sympathizing with particular nomadic programmers, this does not mean that theirs is the right way or something. For example, it is unclear which share of programming problems out there can really be &#8220;solved&#8221; - grazed through and left alone - and which problems actually require continuous care and gardening that a true nomad is not likely to supply. Also, whether there&#8217;s a &#8220;solution&#8221; you only need to &#8220;maintain&#8221; or an &#8220;infrastructure&#8221; you want to &#8220;extend&#8221;, the code needs a permanent owner. I <a href="http://www.yosefk.com/blog/redundancy-vs-dependencies-which-is-worse.html">don&#8217;t believe</a> in <a href="http://www.yosefk.com/blog/extreme-programming-explained.html">collective code ownership</a> any more than in collective ownership of anything else - what it usually means is that everybody collectively fights over something. Therefore I think that ownership should generally be respected, and so a compromise which is, from a technical viewpoint, quite moronic, can otherwise be a great thing - a belief outside the nomad&#8217;s way.</p>
<p>So while I know where my sympathies lie, I don&#8217;t know which camp I&#8217;m in and this is why this metaphor doesn&#8217;t come with any conclusions, just the dynamics. In fact I&#8217;d rather leave it without conclusions but I wouldn&#8217;t mind expanding more on the dynamics. For example, some - but not all - settled civilizations were actually started by nomads enslaving argicultural villagers and settling among them. Apparently a similar distinction can be made between nation state teams of programmers; it is then interesting whether differences in their behavior can be traced to their different origins. Perhaps a person more entertained than appalled by the sort of perspective on the adventurous lives of programmers here presented is also the kind of person more entertained than appaled by the history of mankind in general and so could help develop this line of thought based on his knowledge of history. Could be fun.</p>
<p><strong>Update</strong> (2009-08-18) - <a href="http://www.simple-talk.com/opinion/geek-of-the-week/chuck-moore-geek-of-the-week/?utm_source=simpletalk&amp;utm_medium=email&amp;utm_content=ChuckMooreGOTW20090817&amp;utm_campaign=Opinion">Chuck Moore</a>: &#8220;I’ve met too many people who want to make a career out of a project instead of completing it&#8221; - the nomad&#8217;s view of the settlers. Nomadism is apparent in other writing by Chuck Moore - his disdain for &#8220;complexity&#8221; (which implies dependency on large teams of people you ought to manage, annoying constraints imposed by systems made by someone else and other things nomads don&#8217;t like), his firm opinion that distinct projects should have distinct code bases (customizability and &#8220;reuse&#8221; imply complexity and otherwise reduce the chances to &#8220;hermetically close&#8221; and truly complete a project), etc.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.yosefk.com/blog/the-nomadic-programmer.html/feed</wfw:commentRss>
		</item>
		<item>
		<title>Humans and compilers need each other: the VLIW SIMD case</title>
		<link>http://www.yosefk.com/blog/humans-and-compilers-need-each-other-the-vliw-simd-case.html</link>
		<comments>http://www.yosefk.com/blog/humans-and-compilers-need-each-other-the-vliw-simd-case.html#comments</comments>
		<pubDate>Mon, 27 Jul 2009 20:43:22 +0000</pubDate>
		<dc:creator>Yossi Kreinin</dc:creator>
		
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.yosefk.com/blog/?p=85</guid>
		<description><![CDATA[The state of the art in optimizing compilers today is such that for optimizing code, you need (1) a strong optimizing compiler and (2) a strong optimizing human. My rule of thumb is that (1) alone will yield 2x to 10x slower code. This is also what a person selling a (great) compiler &#8220;giving 80% [...]]]></description>
			<content:encoded><![CDATA[<p>The state of the art in optimizing compilers today is such that for optimizing code, you need (1) a strong optimizing compiler and (2) a strong optimizing human. My rule of thumb is that (1) alone will yield 2x to 10x slower code. This is also what a person selling a (great) compiler &#8220;giving 80% of the optimal performance with no manual intervention&#8221; once told off-record to a roomful of programmers who pressed him into a corner, elevating my rule of thumb to a nobler plane of anecdotal evidence.</p>
<p>Now, I claim that this situation will persist, and in this post I&#8217;ll try to close the fairly large gap between this claim and the mere acknowledgment of what the state of the art is today. The gap is particularly large for the believer in the possibility of <a href="http://en.wikipedia.org/wiki/Strong_ai">strong AI</a> - and while <a href="http://www.yosefk.com/blog/ai-problems.html">my position</a> is a bit different, I do believe in <em>fairly</em> strong AI (can I say that? people keep telling that I can&#8217;t say <a href="http://yosefk.com/c++fqa/defective.html#defect-2">&#8220;nearly context-free&#8221;</a>. oh well.)</p>
<p>I realize that many people experienced in optimization feel that, on the contrary, there&#8217;s in fact no gap large enough to justify an attempt as boringly rigorous (for a pop tech blog) at proving what they think is obvious as will shortly follow. But I think that many language geek discussions could benefit from a stronger bound on the power of a <a href="http://www.c2.com/cgi/wiki?SufficientlySmartCompiler">Sufficiently Smart Compiler</a> than can be derived from (necessarily vague) doubts on the power of AI, and in this post I&#8217;ll try to supply such a bound. I actually think a lot of (mainly domain-specific) things could be achieved by AI-ish work on compilation - closer to &#8220;identify bubble-sort and convert to quick-sort&#8221; than to traditional &#8220;analyze when variables are alive and assign them to registers&#8221; - and this is why it&#8217;s useful to have a feeling when <em>not</em> to go there.</p>
<p>So, consider chess, where <a href="http://en.wikipedia.org/wiki/Advanced_Chess">the state of the art</a> is apparently quite similar to that in optimization: a strong human player using a strong computer program will take out both a human and a computer playing alone. However, it is conceivable that a program can be developed that doesn&#8217;t need the help of a human, being able of completely simulating human thought processes or instead alternative processes which are consistently superior. Why can&#8217;t it be the same with optimizing compilers?</p>
<p>(Chess and optimization are similar in another respect - few care about them; I readily acknowledge the insignificance of a 10x speed-up in a continuously expanding set of circumstances, I just happen to work in an area where it does count as a checkmate.)</p>
<p>I&#8217;ll try to show that optimization is a fundamentally different game from chess, quite aside from the formal differences such as decidability. I&#8217;ll use optimizing for VLIW SIMD processors to show where compilers outperform humans and vice versa. I&#8217;ll be quoting a book by <a href="http://en.wikipedia.org/wiki/Static_superscalar#History">the inventor of VLIW</a> called <a href="http://www.vliw.org/book/">&#8220;Embedded Computing: A VLIW Approach&#8221;</a> to support my position on the relative strength of humans and compilers in these cases. I&#8217;ll then try to show that my examples are significant outside the peculiarities of current hardware, and attempt to state the general reason why humans are indispensable in optimization.</p>
<p><strong>VLIW SIMD</strong></p>
<p>First, we&#8217;ll do the acronym expansion; skip it if you&#8217;ve been through it.</p>
<p><a href="http://en.wikipedia.org/wiki/Very_long_instruction_word">VLIW</a> stands for &#8220;Very Long Instruction Word&#8221;. What it really means is that your target processor can be told to execute several instructions in parallel. For example: <strong>R0=Add R1,R2 <em>and </em>R3=Mul R0,R1 <em>and </em>R1=Shift R5,R6</strong>. For this to work, the processor ought to be able to add, multiply and shift in parallel, that is, its execution hardware must be packed into several units, each getting distinct inputs. The units can be completely symmetric (all supporting the same operations); more often, different units support different instruction sets (so, for example, only one unit in a processor can multiply, but two of them can add, etc.) A stinky thing to note about VLIW instructions is the register semantics. In the example instruction above, R0 is mentioned both as an input and as an output. When it&#8217;s mentioned as an input of Mul its old value is meant, and <em>not</em> the value computed by Add. This is somewhat natural since the whole point is to run Add and Mul in parallel so you don&#8217;t want Mul to wait for Add; but it&#8217;s confusing nonetheless. We&#8217;ll come back to this shortly.</p>
<p><a href="http://en.wikipedia.org/wiki/SIMD">SIMD</a> stands for &#8220;Single Instruction, Multiple Data&#8221; and is known much more widely than VLIW, being available at desktop and server processor architectures like x86 and PowerPC (VLIW reigns the quieter embedded DSP domain, the most commercially significant design probably being TI&#8217;s C6000 family.) SIMD means that you have commands like <strong>R0=Add8 R1,R2</strong>, which does 8 additions given 2 inputs. The registers are thus treated as vectors of numbers - for example, uint8[16], or uint16[8], or uint32[4], assuming 16b registers. This establishes a preference for lower-precision numbers since you can pack more of them into a register and thus process more of them at a time: with uint16, you use Add8, but with uint8, you get to use the 2x faster Add16. We&#8217;ll come back to this, too.</p>
<p><strong>Optimizing for VLIW targets<br />
</strong></p>
<p>The basic thing at which VLIW shines is the efficient implementation of &#8220;flat&#8221; loops (where most programs spend most time); by &#8220;flat&#8221;, I mean that there are no nested if/elses or loops. The technique for implementing loops on VLIW machines is called modulo scheduling. The same technique is used on superscalar machines like modern x86 implementations (the difference from VLIWs being the instruction encoding semantics).</p>
<p>Since I couldn&#8217;t find a good introductory page to link to, we&#8217;ll run through a basic example of modulo scheduling right here. The idea is pretty simple, although when I first saw hardware designers doing it manually in a casual manner, I was deeply shocked (they do it for designing new hardware rather than programming existing hardware but it&#8217;s the same principle).</p>
<p>Suppose you want to compute a[i]=b[i]*c+d on a VLIW processor with 4 units, 2 of them capable of load/store operations, 1 with an adder and 1 with a multiplier. All units have single-cycle latency (that is, their output is available to the next instruction; real VLIW units can have larger latencies, so that several instructions will execute before the result reaches the output register.) Let&#8217;s assume that Load and Store increment the pointer, and ignore the need to test for the exit condition through the loop. Then a trivial assembly implementation of a[i]=b[i]*c+d looks like this:</p>
<p><code>LOOP:<br />
R0=Load b++<br />
R1=Mul R0,c<br />
R2=Add R1,d<br />
Store a++,R2</code></p>
<p>This takes 4 cycles per iteration, and utilizes none of the processor&#8217;s parallelism as each instruction only uses 1 of the 4 execution units. Presumably we could do better; in fact the upper bound on our performance is 1 cycle per iteration, since no unit has to be used more than once to implement a[i]=b[i]*c+d (if we had two multiplications, for example, then with only 1 multiplying unit the upper bound would be 2 cycles/iteration.)</p>
<p>What we&#8217;ll do now is blithely schedule all of the work to a single instruction, reaching the throughput suggested by our upper bound:</p>
<p><code>LOOP:<br />
R0=LOAD b++ <em>and</em> R1=MUL R0,c <em>and</em> R2=ADD R1,d <em>and</em> STORE a++,R2</code></p>
<p>Let&#8217;s look at what this code is doing at iteration N:</p>
<ul>
<li>b[<strong>N</strong>] is loaded</li>
<li>b[<strong>N-1</strong>] (loaded at the previous iteration into R0) is multiplied by c</li>
<li>b[<strong>N-2</strong>]*c (computed at the previous iteration from the old value of R0 and saved to R1) is added to d</li>
<li>b[<strong>N-3</strong>]*c+d is saved to a[<strong>N</strong>]</li>
</ul>
<p>This shows why our naive implementation doesn&#8217;t work (it would be quite surprising if it did) - at iteration 0, b[N-1] to b[N-3] are undefined, so it makes no sense to do things depending on these values. However, starting at N=3, our (single-instruction) loop body seems to be doing its job just fine (except for storing the result to the wrong place - b ran away during the first 3 iterations). We&#8217;ll take care of the first iterations by adding a loop header - instructions which implement the first 3 iterations, only doing the stuff that makes sense in those iterations:</p>
<p><code>R0=Load b++<br />
R0=Load b++ <em>and </em>R1=Mul R0,c<br />
R0=Load b++ <em>and </em>R1=Mul R0,c <em>and </em>R2=Add R1,d<br />
LOOP:<br />
R0=Load b++ <em>and </em>R1=Mul R0,c <em>and </em>R2=Add R1,d <em>and </em>Store a++,R2</code></p>
<p>For similar reasons, we need a loop trailer - unless we don&#8217;t mind loading 3 elements past the end of a[], but I reckon you get the idea. So we&#8217;ll skip the trailer part, and move to the more interesting case - what happens when the loop body won&#8217;t fit into a single instruction. To show that, I can add more work to be done in the loop so it won&#8217;t fit into the units, or I can use a weaker imaginary target machine to do the same work which will no longer fit into the (fewer) units. The former requires more imaginary assembly code, so I chose the latter. Let&#8217;s imagine a target machine with just 2 units, 1 with Load/Store and one with Add/Mul. Then our upper bound on performance is 2 cycles per iteration. The loop body will look like this:</p>
<p><code>LOOP:<br />
R0=Load b++ <em>and</em> R2=Add R1,d<br />
R1=Mul R0,c <em>and </em>Store a++,R2</code></p>
<p>Compared to the single-instruction case, which was still readable (&#8221;Load <em>and </em>Mul <em>and </em>Add <em>and </em>Store&#8221;), this piece looks garbled. However, we can still trace its execution and find that it works correctly at iteration N (assuming we added proper header code):</p>
<ul>
<li>At instruction <strong>1</strong> of iteration N, b[<strong>N</strong>] is loaded</li>
<li>At instruction <strong>2</strong> of iteration N, b[<strong>N</strong>] (loaded to R0 by instr 1 of iter N) is multiplied by c</li>
<li>At instruction <strong>1</strong> of iteration N, b[<strong>N-1</strong>]*c (computed in R1 by instr 2 of iter N-1) is added to d</li>
<li>At instruction <strong>2</strong> of iteration N, b[<strong>N-1</strong>]*c+d (computed in R2 by instr 1 of iter N) is stored to a[<strong>N</strong>]</li>
</ul>
<p>In common VLIW terminology, the number of instructions in the loop body, known to the rest of humanity as &#8220;throughput&#8221;, is called &#8220;initiation interval&#8221;. &#8220;Modulo scheduling&#8221; is presumably so named because the instructions implementing a loop body are scheduled &#8220;modulo initiation interval&#8221;. In our second example, the operations in the sequence Load, Mul, Add, Store go to instructions 0,1,0,1 = 0%2,1%2,2%2,3%2. In our first example, everything goes to i%1=0 - which is why I needed an example with at least 2 instructions in a loop, &#8220;modulo 1&#8243; being a poor way to illustrate &#8220;modulo&#8221;.</p>
<p>In practice, &#8220;modulo scheduling&#8221; grows more hairy than simply computing the initiation interval, creating a linear schedule for your program and then &#8220;wrapping it around&#8221; the initiation interval using %. For example, if for whatever reason we couldn&#8217;t issue Mul and Store at the same cycle, we could still implement the loop at the 2 cycles/iteration throughput, but we&#8217;d have to move the Mul forward in our schedule, and adjust the rest accordingly.</p>
<p>I&#8217;ve done this kind of thing manually for some time, and let me assure you that fun it was not. An initiation interval of 3 with 10-15 temporary variables was on the border of my mental capacity. Compilers, on the other hand, are good at this, because you can treat your input program as a uniform graph of operations and their dependencies, and a legal schedule preserving its semantics is relatively easy to define. You have a few annoyances like pointer aliasing which precludes reordering, but it&#8217;s a reasonably small and closed set of annoyances. Quoting &#8220;Embedded Computing: A VLIW Approach&#8221; (3.2.1, p. 92): &#8220;All of these problems have been solved, although some have more satisfyingly closed-form solution than others.&#8221; Which is why some people with years of experience on VLIW targets know almost nothing about modulo scheduling - a compiler does a fine job without their help.</p>
<p>The book goes on to say that &#8220;Using a VLIW approach without a good compiler is not recommended&#8221; - in other words, a human without a compiler will <em>not</em> perform very well. Based on my experience of hand-coding assembly for a VLIW, I second that. I did reach about 95% of the performance of a compiler that was developed later, but the time it took meant that many optimizations just wouldn&#8217;t fit into a practical release schedule.</p>
<p><strong>Optimizing for SIMD targets</strong></p>
<p>I will try to show that humans optimize well for SIMD targets and compilers don&#8217;t. I&#8217;ll quote &#8220;Embedded Computing: A VLIW Approach&#8221; more extensively in this section. A book on VLIW may not sound like the best source for insight on SIMD, however, I somewhat naturally haven&#8217;t heard of a book on SIMD stressing how compilers <em>aren&#8217;t </em>good at optimizing for it. But then I haven&#8217;t heard of a book stressing the opposite, either, and success papers I saw claimed at automatic vectorization was modest. Furthermore, the particular VLIW book I quote is in fact focusing on embedded DSP where SIMD is ubiquitous, and its central theme is the importance of designing processors in ways making them good targets for optimizing compilers. It sounds like a good place to look for tips on designing compilers to work well with SIMD and vice versa; and if they say they have no such tips, it&#8217;s telling.</p>
<p>And in fact the bottom line of the discussion on SIMD (which they call &#8220;micro-SIMD&#8221;) is fairly grim: &#8220;The ability of compilers to automatically extract micro-SIMD without hints (and in particular, without pointer alignment information) is still unproven, and manual code restructuring is still necessary to exploit micro-SIMD parallelism&#8221; (4.1.4, p. 143). This statement from 2005 is consistent with what (AFAIK) compilers can do today. No SIMD-targeted programming environment I know relieves you of the need to use intrinsics in your C code as in &#8220;a = Add8(b,c)&#8221;, where Add8 is a built-in function-looking operator translated to a SIMD instruction.</p>
<p>What I find fascinating though is the way they singled out <em>pointer alignment</em> as a particularly interesting factor necessitating &#8220;hints&#8221;. Sure, most newbies to SIMD are appalled when they find out about the need to align pointers to 16 bytes if you want to use instructions accessing 16 bytes at a time. But how much of a show-stopper can that be if we are to look at the costs and benefits more closely? Aligning pointers is easy, producing run time errors when they aren&#8217;t is easier, telling a compiler that they are can&#8217;t be hard (say, gcc has a __vector type modifier telling that), and alternatively generating two pieces of code - optimized for the aligned case and non-optimized for the misaligned case - isn&#8217;t hard, either (the book itself mentions still other option - generating non-optimized loop header and trailer for the misaligned sections of an array).</p>
<p>There ought to be more significant reasons for people to be uglifying their code with non-portable intrinsics, and in fact there are. The book even discusses them in the pages preceeding the conclusion - but why doesn&#8217;t it mention the more serious reasons in the conclusion? To me this is revealing of the difference between a programmer&#8217;s perspective and a compiler writer&#8217;s perspective, which is related to the difference between optimization and chess: in chess, there are rules.</p>
<p>For an optimizing programmer, SIMD instructions are a resource from which most benefit must be squeezed at any reasonable cost, including tweaking the behavior of the program. For an optimizing compiler, SIMD instructions are something that can be used to implement a piece of source code, in fact the preferable way to implement it - <em>as long as its semantics are preserved</em>. This means that a compiler obeys rules a programmer doesn&#8217;t, making winning impossible. A typical reaction of a compiler writer is to think of this as not his problem - his problem ending where program transformations preserving the semantics are exhausted. I think this is what explains the focus on things like pointer alignment (which a compiler can in fact solve with a few hints and without affecting the results of the program) at the expense of the substantive issues (which it can&#8217;t).</p>
<p>In the context of SIMD optimizations, the most significant example of rules obeyed by just one of the contestants has to do with precision, which the book mentions right after alignment in its detailed discussion of the problems with SIMD. &#8220;Even when we manipulate byte-sized quantities (as in the case of most pixel-based images, for example), the precision requirements of the majority of manipulation algorithms require keeping a few extra bits around (9, 12, and 16 are common choices) for the intermediate stages of an algorithm. &#8230;this forces us up to the next practical size of sub-word &#8230; reducing the potential parallelism by a factor of two up front.&#8221; They go on to say that a 32b register will end up keeping just 2 16b numbers, giving a 2x speed-up - modest considering all the cases when you won&#8217;t get even that due to other obstacles.</p>
<p>This argument shows the problems precision creates for the hardware implementation of SIMD. However, the precision of <em>intermediate results</em> isn&#8217;t as hard a problem as this presentation makes it sound, because intermediate results are typically kept in registers, not in memory. So to keep the extra bits in intermediate results, you can either use large registers for SIMD operations and not &#8220;general-purpose&#8221; 32b ones, or you can keep intermediate results in pairs of registers - as long as you have enough processing units to generate and further process these intermediate results. Both things are done by actual SIMD hardware.</p>
<p>However, the significant problems created by precision lie at the software side: the compiler doesn&#8217;t know how many bits it will need for intermediate results, nor when precision can be traded for performance. In C, the type of the intermediate results in the expression (a[i]*3+b[i]*c[i])&gt;&gt;d is int (roughly, 32b), even if a, b and c are arrays of 8b numbers, and the parenthesized expression can in fact exceed 16b. The programmer may know that b[i]*c[i] never exceeds, say, 20000 so the whole thing will fit in 16b. That C has no way of specifying precise ranges of values a variable can hold (as opposed to <em>Lisp</em>, of all rivals to the title of the most aggressively optimizing environment) doesn&#8217;t by itself make an argument since a way could be added, just like gcc added __vector, not to mention the option of using a different language. Specifying the ranges of b[i] and c[i] wouldn&#8217;t always suffice and we would have to further uglify the code to specify the range of the <em>product</em> (in case both b[i] and c[i] can be large by themselves but never <em>together</em>), but it could be done.</p>
<p>The real problem with having to specify such information to the compiler isn&#8217;t the lack of a standard way of spelling it, but that a programmer <em>doesn&#8217;t know when to do it</em>. If it&#8217;s me who is responsible for the low-level aspects of optimization, I&#8217;ll notice the trouble with an intermediate result requiring too many bits to represent. I will then choose whether to handle it by investigating the ranges of b[i] and c[i] and restricting them if needed, by moving the shift by d into the expression as in (a[i]*3&gt;&gt;d)+(b[i]*c[i]&gt;&gt;d) so intermediate results never exceed 16b, or in some other way. But if it&#8217;s the compiler who&#8217;s responsible, chances are that I won&#8217;t know that this problem exists at all.</p>
<p>There&#8217;s a trade-off between performance gains, precision losses and the effort needed to obtain more knowledge about the problem. A person can make these trade-offs because the person knows &#8220;what the program <em>really</em> does&#8221;, and the semantics of the source code are just a rendering of that informal spec from one possible perspective. It&#8217;s even worse than that - a person actually <em>doesn&#8217;t know</em> what the program really does <em>until</em> an attempt to optimize it, so even strong AI capable of understanding an informal spec in English wouldn&#8217;t be a substitute for a person.</p>
<p>A person can say, &#8220;Oh, we run out of bits here. OK, so let&#8217;s drop the precision of the coefficients.&#8221; Theoretically, and importantly for my claim, strong AI can <em>also</em> say that - but only if it operates <em>as a person </em>and not as a machine. I don&#8217;t claim that we&#8217;ll never reach a point where we have a machine powerful enough to join our team as a programmer, just that (1) we probably wouldn&#8217;t want to and (2) if we would, it wouldn&#8217;t be called a compiler, it would be called a software developer. That is, you wouldn&#8217;t press a button and expect to get object code from your source code, you&#8217;d expect a conversation: &#8220;Hey, why do you need so many bits here - it&#8217;s just a smoothing filter, do you really think anyone will notice the difference? Do you realize that this generates 4x slower code?&#8221; And then someone, perhaps another machine, would answer that yes, perhaps we should drop some of the bits, but let&#8217;s not overdo it because there are artifacts, and I know <em>you</em> couldn&#8217;t care less because your job ends here but those artifacts are amplified when <em>we</em> compute the gradient, etc.</p>
<p>This is how persons optimize, and while a machine could in theory act as a person, it would thereby no longer be a compiler. BTW, we have a compiler at work that actually does converse with you - it says that it will only optimize a piece of code if you specify that the minimal number of iterations executed is such and such; I think it was me who proposed to handle that case using conversation. So this discussion isn&#8217;t pure rhetoric. I really wish compilers had a -warn-about-missed-optimization-opportunities switch that would give advice of this kind; it would help in a bunch of interesting cases. I just think that in some cases, precision being one of them, the amount and complexity of interactions needed to make headway like that exceeds the threshold separating aggressive optimization from aggressive lunacy.</p>
<p>To be sure, there are optimization problems that <em>could</em> be addressed by strong AI. In the case of SIMD, the book mentions one such area - they call it &#8220;Pack, Unpack, and Mix&#8221;. &#8220;Some programs require rearranging the sub-words within a container to deal with the different sub-word layouts. From time to time, the ordering of the sub-words within a word (for example, coming from loading a word from memory) does not line up with the parallelism in the code&#8230; The only solution is to rearrange the sub-words within the containers through a set of permutation or copying operations (for example, the MIX operation in the HP PA-RISC MAX-2 extension).&#8221;</p>
<p>An example of this reordering problem is warping: computing a[i]=b[i*step+shift]. This is impossible to do in SIMD without a permutation instruction of the kind they mention (PowerPC&#8217;s AltiVec has vec_perm, and AFAIK x86&#8217;s SSE has nothing so you can&#8217;t warp very efficiently). However, even if an instruction is available, compilers are AFAIK unable to exploit it. I see no reason why sufficiently strong AI couldn&#8217;t manage to do such things with few hints in some interesting cases. I wouldn&#8217;t bet my money on it - I side with Mitch Kapor on <a href="http://www.longbets.org/1">the Turing Test bet</a>, but it is conceivable like the invincible chess playing program, and unlike transformations requiring &#8220;small&#8221; changes of the semantics.</p>
<p><strong>Significance</strong></p>
<p>There are areas of optimization that are very significant commercially but hardly interesting in a theoretical discussion (and this here&#8217;s a distinctively theoretical discussion as is any discussion where the possibility of strong AI is supposed to be taken into account).</p>
<p>For example, register allocation for the x86 is exceedingly gnarly and perhaps an interesting argument could be made to defend the need for human intervention in this process in extreme cases (I wouldn&#8217;t know since I never seriously optimized for the x86). However, a general claim that register allocation makes compiler optimization hard wouldn&#8217;t follow from such an argument: on a machine with plentiful and reasonably uniform registers, it&#8217;s hard to imagine what a human can do that a compiler can&#8217;t do better, and almost everybody would agree that the single reason for <em>not </em>making hardware that way is a commercial one - to make an x86-compatible processor.</p>
<p>Now, I believe that both SIMD and VLIW instruction encodings don&#8217;t have this accidental nature, and more likely are part of the Right Way of designing high-performance processors (assuming that it <a href="http://www.yosefk.com/blog/the-high-level-cpu-challenge.html">makes no sense</a> to move cost from software to hardware and call that a &#8220;performance gain&#8221;, that is, assuming that performance is measured per square millimeter of silicon). One argument of rigor worthy of a pop tech blog is that most high-end processors have converged to SIMD VLIW: they have instructions processing short vectors and they can issue multiple instructions in parallel; some do the latter in the &#8220;superscalar&#8221; way of having the hardware analyze dependencies between instructions at run time and others do it in the &#8220;actual VLIW&#8221; way of having the lack of dependencies proven and marked by the compiler, but you end up doing modulo scheduling anyway.</p>
<p>However, this can of course indicate uninformed consumer preference rather than actual utility (I type this on a noisy Core 2 Duo box running Firefox on top of XP, a job better handled by a cheaper, silent single-core - and I&#8217;m definitely a consumer who should have known better). So my main reasons for believing VLIW and SIMD are &#8220;right&#8221; are abstract considerations on building von Neumann machines:</p>
<ul>
<li>You typically have lots of distinct execution hardware: a multiplier has little in common with a load/store unit. Up to a point, it will therefore make sense to support parallel execution of instructions on the different execution hardware. The cost of supporting it will be more I/O ports connecting the execution units with the register file - quite serious because of the multiplexers selecting the registers to read/write. However, the cost of <em>not</em> supporting it will be more execution hardware left unused for more time. So the optimum is unlikely to be &#8220;no parallel execution&#8221;, it&#8217;s likely &#8220;judicious parallel execution&#8221;.</li>
<li>It is cheaper to have few wide registers and wide buses between the register file and the execution units than it is to have many narrow registers and buses. That&#8217;s because the cost of the register file is proportional to the product of #registers and #buses to the execution units. It is thus significantly cheaper to have 1 unit with 4 8bx8b multipliers and 2 32b buses for the inputs then it is to have 4 units with 1 8bx8b multiplier in each and 8 8b buses for the inputs. It&#8217;s also cheaper to keep 4 bytes in 1 32b register than in 4 8b registers. Likewise, it is cheaper to have 4 multipliers in 1 processor than to have 4 full-blown processor cores, because each core would have, say, its own fetch and decode logic and instruction cache - which are in fact pure overhead. So if you have a von Neumann machine with registers and buses and instruction cache, it makes sense (up to a point) to add SIMD to make the best of that investment, and this is why commercial VLIWs have SIMD, although the VLIW theory recommends more units instead.</li>
</ul>
<p>Since I believe that both VLIW and SIMD are essential for maximizing hardware performance, I also tend to think that optimizations needed to utilize these features are &#8220;mainstream&#8221; enough to support a broad claim about optimization in general. And the main point of my claim is that compilers can&#8217;t win in the optimization game, because part of that game is the ability to change the rules once you start losing.</p>
<p>Humans faced with a program they fail to optimize change the program, sometimes a little, sometimes a lot - I heard of 5&#215;5 filters made 4&#215;4 to run on a DSP. But even if we exclude the truly shameless cheating of the latter kind, the gentler cheating going into every serious optimization effort still requires to negotiate and to take responsibility in a way that a person - human or artificial - can, but a tool like a compiler can not.</p>
<p>Modulo scheduling is an example of the kinds of optimizations which in fact are best left to a compiler - the ones where rules are fixed: once the code is spelled, few can be added by further annotations by the author and hence the game can be won without much negotiations with the author; although sometimes a little interrogation can&#8217;t hurt.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.yosefk.com/blog/humans-and-compilers-need-each-other-the-vliw-simd-case.html/feed</wfw:commentRss>
		</item>
		<item>
		<title>Pearls of wisdom</title>
		<link>http://www.yosefk.com/blog/pearls-of-wisdom.html</link>
		<comments>http://www.yosefk.com/blog/pearls-of-wisdom.html#comments</comments>
		<pubDate>Fri, 06 Mar 2009 23:02:46 +0000</pubDate>
		<dc:creator>Yossi Kreinin</dc:creator>
		
		<category><![CDATA[wetware]]></category>

		<guid isPermaLink="false">http://www.yosefk.com/blog/?p=86</guid>
		<description><![CDATA[Proper Fixation always had more unfinished drafts than posts, but recently it&#8217;s getting ridiculous. I do have a couple of drafts I seriously intend to finish (usually the drafts which don&#8217;t make it to posthood during the first 4 hours or so go to the eternal drafthood land.) Until I&#8217;m able to think this stuff [...]]]></description>
			<content:encoded><![CDATA[<p>Proper Fixation always had more unfinished drafts than posts, but recently it&#8217;s getting ridiculous. I do have a couple of drafts I seriously intend to finish (usually the drafts which don&#8217;t make it to posthood during the first 4 hours or so go to the eternal drafthood land.) Until I&#8217;m able to think this stuff out to the point where I can share the results of my thinking, I figured I could share the far less scarce resource of Wisdom with ya.</p>
<p>***</p>
<p>Since I&#8217;ve violated the Golden Rule of Helping Friends with their PC Problems and attempted to help a friend with his PC problem, expectedly wiping out his hard drive in vain, I had many opportunities to explain the Programmer Paradox: how can a programmer fail to make a computer do as he wishes? While the difficulty of debugging a program without the source proved hard to explain to laymen, I think I&#8217;ve found a metaphor that does a good job. A programmer is to the blue screen of death what <a href="http://en.wikipedia.org/wiki/Mikhail_Kalashnikov">Mikhail Kalashnikov</a> is to a loaded <a href="http://en.wikipedia.org/wiki/AK-47">AK-47</a>: just as helpless a victim as any other mortal, except for having a profound understanding of the mechanisms of his execution.</p>
<p>***</p>
<p>I would like to get some statistics on file encryption. For example, of all the files on the planet, X% are encrypted. Of all those files, Y% will never be read by someone due to encryption. Of all those files, Z% will never be read by malicious intruders. If I could lay my hands on the value of just one of these unknowns, I&#8217;d pick Z, because at least 100-Z% of the files will never be read by their owners. I would bet on Z lying somewhere between 0 and 1.</p>
<p>***</p>
<p>One of the key traits of good code is the ease at which it can be modified. One of the key traits of bad code is the high cost of modifying it. So good code is likely to deteriorate until it&#8217;s bad enough to become hard to change, and bad code is likely to stay bad. In short, code has a strong tendency to end up bad.</p>
<p>This can sound worthlessly pessimistic, similarly, for example, to &#8220;It is easier to break a leg than it is to cure it, therefore, most legs end up broken.&#8221; However, I think it&#8217;s more analogous to aging - the accumulation of changes in an organism, observably causing most animals to end up dead. Similarly, code that is used will be changed, code that is changed will degrade, and code that degrades beyond a certain point will die.</p>
<p>***</p>
<p>Health tends to be simpler than disease. For example, everybody can brush their teeth but few people can treat cavities. Similarly, it&#8217;s not very hard to maintain a sane development environment, but pretty hard to deal with the tide of bugs and of long-living branches resulting from a failure to do so. However, I&#8217;m generally optimistic about the chances of such cavities to be treated, and as usual, the optimism is based on the pain they cause - a strong incentive to seek and reward treatment.</p>
<p>***</p>
<p>There&#8217;s this evolution vs Intelligent Design debate. Well, I don&#8217;t know about life on Earth, but I sure have hard time believing in Intelligent Design in software. Code has to repeatedly survive exposure to users upon whom its fate depends. Yes, &#8220;users&#8221; can be a set containing just the author, but only if it&#8217;s honest-to-God USAGE, that is, the author has to pay a price when the program is hard to use - like not getting important things done properly. Show me a program that someone finds useful and that wasn&#8217;t subject to such evolutionary pressure, but rather was Intelligently Designed as useful.</p>
<p>I think that my intense hatred of the word &#8220;design&#8221; has to do with its prominent place in the speech of software creationists. These people are likely to constantly complain about not having enough resources to do The Right Thing in the ugly real world. They are also likely to give you software that you hate enough to wish to kill them, and be articulate enough to convince you that the problem is at your end, and fail to notice how this latter ability quadruples your desire to slash their body into square millimeter pieces.</p>
<p>***</p>
<p>I&#8217;ll conclude with an off-topic request: if you know a good text advocating a collectivistic or other kind of heterodox approach to economics, I&#8217;d be very grateful for a reference. By &#8220;advocacy&#8221;, I mean a text for laymen expressing support for a certain set of policies (as opposed to merely criticizing the effects of existing policies) - like Milton Friedman&#8217;s &#8220;Capitalism and Freedom&#8221;, for example.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.yosefk.com/blog/pearls-of-wisdom.html/feed</wfw:commentRss>
		</item>
	</channel>
</rss>
