Consistency: how to defeat the purpose of IEEE floating
point
I don't know much about the design of IEEE floating point, except for the fact that a lot of knowledge and what they call
"intellectual effort" went into it. I don't even know the requirements, and I suspect those were pretty detailed and complex
(for example, the benefits of having a separate representation for +0 and -0 seem hard to grasp unless you know about the very
specific and hairy examples in the complex plane). So I don't trust my own summary of the requirements very much. That said,
here's the summary: the basic purpose of IEEE floating point is to give you results of the highest practically possible
precision at each step of your computation.
I'm not going to claim this requirement is misguided, because I don't feel like arguing with people two orders of magnitude
more competent than myself who have likely faced much tougher numerical problems than I've ever seen. What I will claim is that
differences in numerical needs divide programmers into roughly three camps, and the highest-possible-precision approach hurts
one of them really badly, and so has to be worked around in ways we'll discuss. The camps are:
- The huge camp of people who do businessy accounting. Those should work with integral types to get complete, deterministic,
portable control over rounding and all that. Many of the clueless people in this camp represent 1 dollar and 10 cents as the
floating point number 1.1. While they are likely a major driving force behind economical growth, I still think they deserve all
the trouble they bring upon themselves.
- The tiny camp doing high-end scientific computing. Those are the people who can really appreciate the design of IEEE
floating point and use its full power. It's great that humanity accidentally satisfied the needs of this small but really cool
group, making great floating point hardware available everywhere through blind market forces. It's like having a built-in
Stradivari in each home appliance. Yes, perhaps I exaggerate; I get that a lot.
- The sizable camp that deals with low-end to mid-range semi-scientific computing. You know, programs that have some geometry
or physics or algebra in them. 99.99% of the code snippets in that realm work great with 64b floating point, without the author
having invested any thought at all into "numerical analysis". 99% of the code works with 32b floats. When someone stumbles upon
a piece of code in the 1% and actually witnesses fatal precision loss, everybody gathers to have a look as if they heard about a
beautiful rainbow or a smoke suggesting a forest fire near the horizon.
The majority of people who use and actually should be using floating point are thus in camp 3. Those people don't care about
precision anywhere near camp 2, nor do they know how to make the best of IEEE floating point in the very unlikely circumstances
where their naive approach will actually fail to work. What they do care about though is consistency. It's important that things
compute the same on all platforms. Perhaps more importantly for most, they should compute the same under different build
settings, most notably debug and release mode, because otherwise you can't reproduce problems.
Side note: I don't believe in build modes; I usually debug production code in release mode. It's not just floating point
that's inconsistent across modes β it's code snippets with behavior undefined by the language, buggy dependence on timing,
optimizer bugs, conditional compilation, etc. Many other cans of worms. But the fact is that most people have trouble debugging
optimized code, and nobody likes it, so it's nice to have the option to debug in debug mode, and to do that, you need things to
reproduce there.
Also, comparing results of different build modes is one way to find worms from those other cans, like undefined behavior and
optimizer bugs. Also, many changes you make are optimizations or refaptorings and you can check their sanity by making sure they
didn't change the results of the previous version. As we'll see, IEEE FP won't give you even that, regardless of platforms and
build modes. The bottom line is that if you're in camp 3, you want consistency, while the "only" things you can expect from IEEE
FP is precision and speed. Sure, "only" should be put in quotes because it's a lot to get, it's just a real pity that fulfilling
the smaller and more popular wish for consistency is somewhere between hard and impossible.
Some numerical analysts seem annoyed by the camp 3 whiners. To them I say: look, if IEEE FP wasn't the huge success that it
is in the precision and speed departments, you wouldn't be hearing from us because we'd be busy struggling with those problems.
What we're saying is the exact opposite of "IEEE FP sucks". It's "IEEE FP is so damn precise and fast that I'm happy with ALL of
its many answers β the one in optimized x86 build, the one in debug PowerPC build, the one before I added a couple of local
variables to that function and the one I got after that change. I just wish I consistently got ONE of these answers, any of
them, but the same one." I think it's more flattering than insulting.
I've accumulated quite some experience in defeating the purpose of IEEE floating point and getting consistency at the (tiny,
IMO) cost of precision and speed. I want to share this knowledge with humanity, with the hope of getting rewarded in the
comments. The reward I'm after is a refutation of my current theory that you can only eliminate 95%-99% of the pain
automatically and have to solve the rest manually each time it raises its ugly head.
The pain breakdown
I know three main sources of floating point inconsistency pain:
- Algebraic compiler optimizations
- "Complex" instructions like multiply-accumulate or sine
- x86-specific pain not available on any other platform; not that ~100% of non-embedded devices is a small market share for a
pain.
The good news is that most pain comes from item 3 which can be more or less solved automatically. For the purpose of decision
making ("should we invest energy into FP consistency or is it futile?"), I'd say that it's not futile and if you can cite actual
benefits you'd get from consistency, than it's worth the (continuous) effort.
Disclaimer: I only discuss problems I know and to the extent of my understanding. I have no solid evidence that this
understanding is complete. Perhaps the pain breakdown list should have item 4, and perhaps items 1 to 3 have more meat than I
think. And while I tried to get the legal stuff right β which behavior conforms to IEEE 754, which conforms to C99, and which
conforms to nothing but is still out there β I'm generally a weak tech lawyer and can be wrong. I can only give you the "worked
on my 4 families of machines" kind of warranty.
Algebraic compiler optimizations
Compilers, or more specifically buggy optimization passes, assume that floating point numbers can be treated as a field β you know, associativity, distributivity, the works.
This means that a+b+c can be computed in either the order implied by (a+b)+c or the one implied by a+(b+c). Adding actual
parentheses in source code doesn't help you one bit. The compiler assumes associativity and may implement the computation in the
order implied by regrouping your parentheses. Since each floating point operation loses precision on some of the possible
inputs, the code generated by different optimizers or under different optimization settings may produce different results.
This could be extremely intimidating because you can't trust any FP expression with more than 2 inputs. However, I think that
programming languages in general don't allow optimizers to make these assumptions, and in particular, the C standard doesn't
(C99 Β§5.1.2.3 #13, didn't read it in the document but saw it cited in two sources). So this sort of optimization is actually a
bug that will likely be fixed if reported, and at any given time, the number of these bugs in a particular compiler should be
small.
I only recall one recurring algebraic optimization problem. Specifically, a*(b+1) tended to be compiled to a*b+a in release
mode. The reason is that floating-point literal values like 1 are expensive; 1 becomes a hairy hexadecimal value that has to be
loaded from a constant address at run time. So the optimizer was probably happy to optimize a literal away. I was always able to
solve this problem by changing the spelling in the source code to a*b+a β the optimizer simply had less work to do, while the
debug build saw no reason to make me miserable by undoing my regrouping back into a*(b+1).
This implies a general way of solving this sort of problem: find what the optimizer does by looking at the generated
assembly, and do it yourself in the source code. This almost certainly guarantees that debug and release will work the same.
With different compilers and platforms, the guarantee is less certain. The second optimizer may think that the optimization you
copied from the first optimizer into your source code is brain-dead, and undo it and do a different optimization. However, that
means you target two radically different optimizers, both of which are buggy and can't be fixed in the near future; how unlucky
can you get?
The bottom line is that you rarely have to deal with this problem, and when it can't be solved with a bug report, you can
look at the assembly and do the optimization in the source code yourself. If that fails because you have to use two very
different and buggy compilers, use the shotgun described in the next item.
"Complex" instructions
Your target hardware can have instructions computing "non-trivial" expressions beyond a*b or a+b, such as a+=b*c or sin(x).
The precision of the intermediate result b*c in a+=b*c may be higher than the size of an FP register would allow, had that
result been actually stored in a register. IEEE and the C standard think it's great, because the single instruction generated
from a+=b*c is both faster and more precise than the 2 instructions implementing it as d=b*c, a=a+d. Camp 3 people like myself
don't think it's so great, because it happens in some build modes but not others, and across platforms the availability of these
instruction varies, as does their precision.
AFAIK the "contraction" of a+=b*c is permitted by both the IEEE FP standard (which defines FP + and *) and the C standard
(which defines FP types that can map to standards other than IEEE). On the other hand, sin(x), which also gets implemented in
hardware these days, isn't addressed by either standard β to the same effect of making the optimization perfectly
legitimate. So you can't solve this by reporting a bug the way you could with algebraic optimizations. The other way in which
this is tougher is that tweaking the code according to the optimizer's wishes doesn't help much. AFAIK what can help is one of
these two things:
- Ask the compiler to not generate these instructions. Sometimes there's an exact compiler option for that, like gcc's
platform-specific -mno-fused-madd flag, or there's (a defined and actually implemented) pragma directive such as #pragma STDC
FP_CONTRACT. Sometimes you don't have any of that, but you can lie to the compiler that you're using an older (compatible)
revision of the processor architecture without the "complex" instructions. The latter is an all-or-nothing thing
enabling/disabling lots of stuff, so it can degrade performance in many different ways; you have to check the cost.
- If compiler flags can't help, there's the shotgun approach I promised to discuss above, also useful for hypothetical cases
of hard-to-work-around algebraic optimizations. Instead of helping the optimizer, we get in its way and make optimization
impossible using separate compilation. For example, we can convert a+=b*c to a+=multiply_dammit(b,c); multiply_dammit is defined
in a separate file. This makes it impossible for most optimizers to see the expression a+=b*c, and forces them to implement
multiplication and addition separately. Modern compilers support link-time inlining and then they do optimize the result as a
whole. But you can disable that option, and as a side effect speed up linkage a great deal; if that seriously hurts performance,
your program is insane and you're a team of scary ravioli coders.
The trouble with the shotgun approach, aside from its ugliness, is that you can't afford to shoot at the performance-critical
parts of your code that way. Let us hope that you'll never really have to choose between FP consistency and performance, as I've
never had to date.
x86
Intel is the birthplace of IEEE floating point, and the manufacturer of the most camp-3-painful and otherwise convoluted FP
hardware. The pain comes, somewhat understandably, from a unique commitment to the IEEE FP philosophy β intermediate results
should be as precise as possible; more on that in a moment. The "convoluted" part is consistent with the general insanity of the
x86 instruction set. Specifically, the "old" (a.k.a "x87") floating point unit uses a stack architecture for addressing FP
operands, which is pretty much the exact opposite of the compiler writer's dream target, but so is the rest of x86. The "new"
floating point instructions in SSE don't have these problems, at the cost of creating the aesthetic/psychiatric problem of
actually having two FP ISAs in the same processor.
Now, in our context we don't care about the FP stack thingie and all that, the only thing that matters is the consistency of
precision. The "old" FP unit handles precision thusly. Precision of stuff written to memory is according to the number of bits
of the variable, 'cause what else can it be. Precision of intermediate results in the "registers" (or the "FP stack" or whatever
you call it) is defined according to the FPU control & status register, globally for all intermediate results in
your program.
By default, it's 80 bits. This means that when you compute a*b+c*d and a,b,c,d are 32b floats, a*b and c*d are computed in
80b precision, and then their 80b sum is converted to a 32b result in memory (if a*b+c*d is indeed written to memory
and isn't itself an "intermediate" result). Indeed, what's "intermediate" in the sense of not being written to memory and what
isn't? That depends on:
- Debug/release build. If we have "float e=a*b+c*d", and e is only used once right in the next line, the optimizer probably
won't see a point in writing it to memory. However, in a debug build there's a good reason to allocate it in memory, because if
you single-step your program and you're already past the line that used e, you still might want to look at the value of e, so
it's good that the compiler kept a copy of it for the debugger to find.
- The code "near" e=a*b+c*d according to the compiler's notion of proximity also affects its decisions. There are only so many
registers, and sometimes you run out of them and have to store things in memory. This means that if you add or remove code near
the line or in inline functions called near the line, the allocation of intermediate results may change.
Compilers could have an option asking them to hide this mess and give us consistent results. The problems with this are that
(1) if you care about cross-platform/compiler consistency, then the availability of cross-mode consistency options in one
compiler doesn't help with the other compiler and (2) for some reason, compilers apparently don't offer this option in a very
good way. For example, MS C++ used to have a
/fltconsistency switch but seems to have abandoned it in favor of an insane special-casing of the syntax float(a*b)+float(c*d) β
that spelling forces consistency (although the C++ standard doesn't assign it a special meaning not included in the
plain and sane a*b+c*d).
I'd guess they changed it because of the speed penalty it implies rather than the precision penalty as they say. I haven't
heard about someone caring both about consistency and that level of precision, but I did hear that gcc's consistency-forcing
-ffloat-store flag caused notable slowdowns. And the reason it did is implied by its name β AFAIK the only way to implement x86
FP consistency at compile time is to generate code storing FP values to memory to get rid of the extra precision bits. And
-ffloat-store only affects
named variables, not unnamed intermediate results (annoying, isn't it?), so /fltconsistency, assuming it actually gave you
consistency of all results, should have been much slower. Anyway, the bottom line seems to be that you can't get much
help from compilers here; deal with it yourself. Even Java gave up on its initial intent of getting consistent results on the
x87 FPU and retreated to a cowardly strictfp
scheme.
And the thing is, you never have to deal with it outside of x86 β all floating point units I've come across, including the
ones specified by Intel's SSE and SSE2, simply compute 32b results from 32b inputs. People who decided to do it that way and rob
us of quite some bits of precision have my deepest gratitude, because there's absolutely no good way to work around the
generosity of the original x86 FPU designers and get consistent results. Here's what you can do:
- Leave the FP CSR configured to 80b precision. 32b and 64b intermediate results aren't really 32b and 64b. The fact that it's
the default means that if you care about FP result consistency, intensive hair pulling during your first debugging sessions is
an almost inevitable rite of passage.
- Set the FP CSR to 64b precision. If you only use 64b variables, debug==release and you're all set. If you have 32b floats
though, then intermediate 32b results aren't really 32b. And usually you do have 32b floats.
- Set the FP CSR to 32b precision. debug==release, but you're far from "all set" because now your 64b results,
intermediate or otherwise, are really 32b. Not only is this a stupid waste of bits, it is not unlikely to fail, too, because 32b
isn't sufficient even for some of the problems encountered by camp 3. And of course it's not compatible with other
platforms.
- Set the FP CSR to 64b precision during most of the program run, and temporarily set it to 32b in the parts of the program
using 32b floats. I wouldn't go down that path; option 4 isn't really an option. I doubt that you use both 32b and 64b variables
in a very thoughtful way and manage to have a clear boundary between them. If you depend on the ability of everyone to correctly
maintain the CSR, then it sucks sucks sucks.
Side note: I sure as hell don't believe in "very special" "testing" build/running modes. For example, you could say that you
have a special mode where you use option (3) and get 32b results, and use that mode to test debug==release or something. I think
it's completely self-defeating, because the point of consistency is being able to reproduce a phenomenon X that happens in a
mode which is actually important, in another mode where reproducing X is actually useful. Therefore, who needs
consistency across inherently useless modes? We'd be defeating the purpose of defeating the purpose of IEEE floating point.
Therefore, if you don't have SSE, the only option is (2) β set the FP CSR to 64b and try to avoid 32b floats. On Linux, you
can do it with:
#include <fpu_control.h>
fpu_control_t cw;
_FPU_GETCW(cw);
cw = (cw & ~_FPU_EXTENDED) | _FPU_DOUBLE;
_FPU_SETCW(cw);
Do it first thing in main(). If you use C++, you should do it first thing before main(), because people can use FP
in constructors of global variables. This can be achieved by figuring out the compiler-specific translation unit initialization
order, compiling your own C/C++ start-up library, overriding the entry point of a compiled start-up library using stuff like
LD_PRELOAD, overwriting it in a statically linked program right there in the binary image, having a coding convention forcing to
call FloatingPointSingleton::instance() before using FP, or shooting the people who like to do things before main(). It's a
trade-off.
The situation is really even worse because the FPU CSR setting only affects mantissa precision but not the exponent range, so
you never work with "real" 64b or 32b floats there. This matters in cases of huge numbers (overflow) and tiny numbers (double
rounding of subnormals). But it's bad enough already, and camp 3 people don't really care about the extra horror; if you want
those Halloween stories, you can find them here. The good news are that today,
you are quite likely to have SSE2 and very likely to have SSE on your machine. So you can automatically sanitize all the mess as
follows:
- If you have SSE2, use it and live happily ever after. SSE2 supports both 32b and 64b operations and the intermediate results
are of the size of the operands. BTW, mixed expressions like a+b where a is float and b is double don't create consistency
problems on any platform because the C standard specifies the rules for promotion precisely and portably (a will be promoted to
double). The gcc way of using SSE2 for FP is -mfpmath=sse -msse2.
- If you only have SSE, use it for 32b floats which it does support (gcc: -mfpmath=sse -msse). 64b floats will go to the old
FP unit, so you'll have to configure it to 64b intermediate results. Everything will work, the only annoying things being (1)
the retained necessity to shoot the people having fun before main and (2) the slight differences in the semantics of control
flags in the old FP and the SSE FP CSR, so if you configure your own policy, floats and doubles will not behave exactly the
same. Neither problem is a very big deal.
Interestingly, SSE with its support for SIMD FP commands actually can make things worse in the
standard-violating-algebraic-optimizations department. Specifically, Intel's compiler reportedly has (had?) an optimization
which unrolls FP accumulation loops and reorders additions in order to utilize SIMD FP commands (gcc 4 does that, too, but only
if you explicitly ask for trouble with -funsafe-math-optimizations or similar). But I wouldn't conclude anything from it, except
that automatic vectorization, which is known to work only on the simplest of code snippets, actually doesn't work even on
them.
Summary: use SSE2 or SSE, and if you can't, configure the FP CSR to use 64b intermediates and avoid 32b floats. Even the
latter solution works passably in practice, as long as everybody is aware of it.
I think I covered everything I know except for things like long double, FP exceptions, etc. β and if you need that, you're
not in camp 3; go away and hang out with your ivory tower numerical analyst friends. If you know a way to automate away more
pain, I'll be grateful for every FP inconsistency debugging afternoon your advice will save me.
Happy Halloween!
Beautiful and thought provoking, as usual. But... I assume
consistency means getting identical bit patterns in FP operation results
across different compilers and platforms. Now why is that goal
desirable? Why care about a binary representation of a double being one
way or another?
Thanks.... "Bit pattern" is a loaded term because it may refer to the
represented number or it may look beyond that at the actual bit string
stored in memory (in which case you start talking about endianness,
etc.). I assume we're talking about the represented number, so for
example, the question is what's wrong with having computed x=1.0001 on
one platform and 0.999 on another platform (build mode, compiler).
The problem is ultimately that x will look differently when printed,
will yield a different value if converted to an int, and if it's
compared with 1.0 in a conditional statement, a different code path will
execute. This complicates testing because you can't compare outputs of
different build settings or program versions. It also complicates
debugging because you can't reproduce problems found in a hard-to-debug
environment in an easy-to-debug environment.
Now, there's another problem β that of theoretical program
correctness. That is, if you assume that (int)x is always between 1 and
10 and once in a lifetime x actually computes as 0.999 and you have an
out-of-bounds access, than you're really screwed way beyond ease of
testing and debugging.
However, this isn't the problem of camp 3. It's the problem of camp 2
who compute precise things using precise techniques. A camp 3 person
should add a max(min((int)x,10),1) and be done with it. That's because a
camp 3 person can't/doesn't want to/shouldn't be busy thinking about
stuff like the behavior for subnormal numbers. The inherent precision of
his data and algorithms is so lame that this effort is completely
misplaced; it won't yield good results, and it costs much more than
braindead sanitizing like the min/max thing.
The bottom line is that I want things to be the same because
identical output eases testing and debugging, and not because I care
about the exact bit pattern that will be computed identically.
hur hur .. refaptoring!
According to Google, the term was coined by Steve Yegge, and still has only one Google
hit. This post is supposed to raise that number by 100%.
Google up math professor William Kahan at UC Berkeley. He was the guy
who invented IEEE 754 arithmetic. You might ask him?
I'd argue if you assume (int)x is always between 1 and 10 and once in
a lifetime it actually computes as 0.999, you are screwed no matter if
you have crosscompiler consistency in x, or not. Granted, it would not
be a bad thing to have it, but even if you do, you are still screwed
until you do the max(min) thing, or assume x is between 1-e and 10+e
instead.
I understand your desire for consistency, but there are some problems
with your claims and assumptions.
Anonymous above recommend you check out Kahan, the father of IEEE
floating point, and for anyone who has enough interest to read a blog
entry like this, let alone write one, Kahan is in fact required reading
β potentially up to reading everything on his site:
http://www.cs.berkeley.edu/~wkahan/
I read/re-read his publications every few years, and happened to have
bookmarked some that I liked last month:
"Why Can I Debug some Numerical Programs that You Can't?"
http://www.cs.berkeley.edu/~wkahan/Stnfrd50.pdf
"A Logarithm Too Clever by Half" http://www.cs.berkeley.edu/~wkahan/LOG10HAF.TXT
Also the must-read edited reprint of the paper What Every Computer
Scientist Should Know About Floating-Point Arithmetic, by David
Goldberg
http://docs.sun.com/source/806-3568/ncg_goldberg.html
Back to what I consider important misstatements, certainly IEEE tries
to avoid unnecessary imprecision, and multiple things in its design show
that, but that is different than what I consider to be an incorrect
statement, that "to give you results of the highest practically possible
precision at each step of your computation"
Nay nay. Your desire for consistency is similar to the goals of an
expert numerical analyst like Kahan, but to a first approximation,
*accuracy* is far more important than precision, and the two are not at
all equivalent, yet programmers in general tend not to think about the
difference.
Finally, most of the (interesting and valuable) concerns, approaches,
and advice you present doesn't have anything to do with IEEE floating
point. Almost all of your points have to do with with idiosyncracies of
languages, compilers, and optimizers.
...possibly aside from your ending comments about fpu_control, which
I merely skimmed and can't comment on due to the press of time.
Despite those critiques, thanks for your article. Any discussion that
makes programmers more aware of any of the zillions of issues relating
to floating point is a good public service.
P.S. I forgot to mention...when I said "Almost all of your points
have to do with with idiosyncracies of languages, compilers, and
optimizers.", I meant to say that you would have similar issues with
non-IEEE floating point, and pre-IEEE standard, there were many.
And when I mentioned the concern of experts like Kahan to have
consistency, I meant to mention the (unfortunately not all that
well-know fact) work by Kahan and his students and successors to create
a C libm (best known as fdlibm β Freely Distributable Libm), which was
"exactly" accurate, meaning that each function produced a result that
differed from an infinite-precision result by no more than 1/2 of the
final bit of precision.
In other words, Kahan with others under him produced a libm (first
distributed with BSD Unix, IIRC) that was both as accurate as possible
and had maximum precision possible, simultaneously.
What a tour de force!
Furthermore, they released that library for multiple floating point
standards, not just IEEE, but also VAX and PDP 11 etc (IIRC).
That should make anyone stop and say "hmm!"
You lost me as soon as I found out your parentheses weren't done in
Ada.
The solution that has always worked for me when running tests on
numerical code is to compare the results as numbers (rather than
strings). If the difference between the expected result and the actual
result is less than, say, 0.00001% then I consider it a pass.
This sounds a lot easier than messing with the code or worrying about
rounding.
If floating-point error has any significant effect on the result, it
would tend to indicate a poorly-conditioned numerical algorithm that
should not be used anyway.
You mispell "worms" as "warms" in two places.
Actually, group two (high-end scientific calculations) also require
consistency. Calculations need to be able to be repeated to obtain the
same results. This inconsistency that you talk about is not an issue
with IEEE floating points, it is an issue with the compilers you are
using.
To people saying that my problem is with software tools and not the
FP standard: for those on x86, the biggest problem before SSE used to be
the x87 FPU, which is almost impossible to get consistent results out
of, and AFAIK IEEE 754 allows it to be built that way because it doesn't
mind extra precision. Also, things like contraction are AFAIK explicitly
supported by IEEE, without requiring a mode where contraction is forced
to give results consistent to non-optimized mul-then-add implementation.
Also, I think the 2008 revision of IEEE 754 has a reproducibility clause
which discusses the way for hw/sw systems to provide the option of
writing code generating reproducible results. So they do acknowledge
this to be in their territory. I didn't talk about it because I tend to
care more about de-facto standards than upcoming standards, but I
thought it's worth mentioning now that you blame me for misattributing
blame.
To people saying that algorithms shouldn't "depend" on small
precision errors (where "depend" means "produce radically different
results by taking a different code path or indexing into a different
table entry", etc.) β for many, many problems it doesn't work that way
and there's no reason to think of it like that. Think about problems
without a "right" answer β for example, face recognition. Where does the
region with the face start β at pixel 397 or 398? When a person in a
crowd is obstructed by another person β should we lose the "target" at
frame 55 or 56? And what if the vision algorithm is imperfect, as they
all are, and you only find 97% of the things you should? "Noise" and
algorithmic imperfection can make different examples appear in the
false-negative 3%, so it's no longer +-1 in the pixel or frame number.
Sure, you can and should implement automated tests measuring the quality
of your code by defining tolerance criteria capable of saying "you're
about that good" without relying on exact pixel numbers. However, many
times what you care about isn't how good you are, but why you crash on
input so and so in release build. Now, if debug build is only
statistically similar to the release build, the crash will probably not
reproduce on that input. Net result: FP inconsistency makes things hard
to debug.
Now, the whole theme of "you're walking on a borderline and a slight
numerical push can make you cross it" β it happens all the time, with
examples much simpler than face recognition. It's wishful thinking to
assume that "user-visible" program results shouldn't be affected by
numerical noise or else the algorithm is "buggy".
Regarding W. Kahan, for whom I, um, Googled a lot: the fact that you
can develop an insanely precise library on top of IEEE, and that you can
fight and win the battle for consistent results on top of IEEE doesn't
contradict what I say. What I say is that you can't do it without an
extremely detailed and accurate numerical analysis. If you compute a
trig function and you manage to reach the maximal possible precision
your datatype supports, sure, you're precise and consistent. This the
kind of thing camp 2 people can do and happen to be doing, although I'm
not sure you'll always end up like that β if you could always reach the
maximal possible datatype precision with IEEE, for every algorithm, it
would be real numbers, and they aren't. But the thing is, camp 3 people
deal with data and algorithms which aren't worth the effort, because
you'll get lousy precision anyway. Those are the people who want
consistency without the, um, enormous cost of detailed numerical
analysis.
To Doug Merritt: I only noticed your first comment now since WP
cleverly put it into the moderation queue because of the links.
I'd be grateful if you explained me your argument regarding precision
and accuracy in more detail, because I think the general issue of their
difference is orthogonal to what I'm saying. In terms of this dichotomy,
"accuracy" is related to algorithm design (an approximation of sine
should actually approximate sine on all inputs β give a result close to
the real value), and "precision" is related to algorithm implementation
(the hw/sw combo running a sine approximation should compute the same
thing each time you try it whether it's computing an accurate
approximation or a crummy one; although it's not that trivial because
one thing crummy algorithms do is overflow and the like, so the
requirement of their designers to get precisely the same wrong answer
across many systems may be "illegitimate" β that's why I think that the
inability to control the exponent range on the x87 isn't a big
deal).
Anyway, in terms of accuracy/precision, when I ask for "consistency",
it's synonymous for "precision", actually the maximal possible precision
β that is, "gimme the same answer everywhere". I do that based on the
assumption that the accuracy of the basic operations of IEEE FP is good
enough to be able to design decently accurate algorithms with very
little effort in most cases (and that is true whether I use 64b regs or
80b regs). But in general, the question of analyzing the accuracy of
algorithms implemented on top of IEEE 754 is out of my scope β it's the
business of people designing them, and it was the business of the people
designing IEEE 754 who cared about the ability to design specific
algorithms accurately on top of the basic operations.
Now, the fact that I use "consistency of results" where you'd use
"precision", and that I use "precision of operations" where you'd say
"accuracy" is consistent with your claim that most programmers don't
think in terms of accuracy/precision; I'm writing for these people,
after all. I think my claims are still valid and readable though.
refaptoring!!! it's a brillliant meme, didn't hear it before
(and the article is fine as always)
thanks, but as noted in a comment above β it's Steve Y's (there 1
Google hit for it right now.)
Yossi,
Your comment #13 finally made the point of this clear in my mind. The
'off-by-one-pixel' example in image processing is a great example of why
one would need consistency.
Although, I would still argue that the 'relying-on-rounding-is-a-bug'
argument still almost always applies to continuous problem spaces.
Thanks for the great article.
It's definitely a pain point, but what's the solution? How are you
going to standardize hardware and compiler implementations? If you
managed to do that lots of people would get left out in the cold.
Wishing for consistency in computers is a bit more realistic, but still
in the same camp as biologists wishing for genetic equivalence in all
members of a species. Sure it would solve a boat load of problems but it
ain't gonna happen, and even if it did, the perfection would only exist
until the first bug was found.
Precision = how many digits you have in a number.
Accuracy = how close you are to the "real" value.
3.55555 is a more precise approximation of Pi than 3.1, but a less
accurate one.
@Gabe: as I said in the article, AFAIK the specific problem of FP
consistency is "almost solvable" on the hardware available today, so the
state of things isn't bad, it's just that you don't get consistency by
default.
@Marcel: the term "precision" is used to name many different things.
"Precision" means "#digits" in the context of
cout<<setprecision(20), it means "accuracy" as you defined it in a
lot of informal discussions (including this article except for the
comments mentioning the word "accuracy"), and in the context of "the
difference between precision and accuracy", it means the distance
between measurements/computations obtained from different trials: http://en.wikipedia.org/wiki/Precision_and_accuracy
So I don't think any of this is about the number of digits, which is
a base-dependent and hardly interesting property.
I was clarifying Doug Merritt's comment, which you seemed not to
understand ("Iβd be grateful if you explained me your argument regarding
precision and accuracy in more detail..."). It seems obvious to me that
he's using the same definition I am.
A comment on this quote about compiler optimizations that alter
precision: "So this sort of optimization is actually a bug that will
likely be fixed if reported, and at any given time, the number of these
bugs in a particular compiler should be small."
In many cases, this is not in fact a bug, but a case of using an
optimization flag that tells the compiler to introduce these
optimizations in the interest of greater speed of computation. In GCC,
the flag is -ffast-math. I believe that in some commercial compilers,
this sort of flag is on by default at higher levels of optimization.
Thus, if you rely on these optimizations being absent, you should
read your compiler documentation carefully!
I shouldn't; standard-compliant behavior ought to be the default.
Thanks for the info on -ffast-math though; I remember seeing it in
some make output, but I haven't looked it up. Now I'll be sure to tear
it out of every compiler option list I'll ever stumble upon.
Regarding the a*(b+1) => a*b+a transformation β I'm positive it
was a bug; nobody would have stuffed -ffast-math into that Makefile.
...Liked your (?) page at LtU.
[...] my channel flippinβ mind saved by Yaksah2008-12-14 β
inspiration saved by frogfish2008-12-13 β Consistency: how to defeat the
purpose of IEEE floating point saved by Jelloman182008-12-13 β as soon
as the incarnational analogy is made, oneβs indulgences [...]
Consistency is important. Code that is compiled on one system may not
produce a consistent result on another system. I am talking about code
that involves a large number of accumulated computations. If the program
is run once, it produces a result, which we average over the length of
the run. The second time we run the same code a new set of results
emerge and the average can differ by 10%. The differences start out at
the 15th significant figure and accumulate past the 8th and right down
to the first figure. That means that the code behaves differently each
time it is run, dependent on things like memory and stack. If the
program is run on the system it is compiled on, it should always produce
consistent results (but it will not). When it is run on the client's
computer it definitely will not produce the same numbers β and that is
serious if it is a weather prediction program.
Brilliant article. And for those wondering why consistent floating
point calculations are so important β many, MANY videogames rely on
complete determinism and sending only inputs per-simulation from to keep
players in sync, eg. most RTS, even some physics simulation games like
"Little Big Planet", basically any situation in which the state of the
system is too large to practically send over the network, determinism is
key!
Brilliant article. And for those wondering why consistent floating
point calculations are so important β many, MANY videogames rely on
complete determinism and sending only inputs per-simulation frame to
keep each player's game simulation in sync. For example most RTS games
use this technique (eg. "Starcraft", "Age of Empires") also some physics
simulation games like βLittle Big Planetβ do too, and many sports games
you play online as well . Basically, any situation where the state of
the system is too large to practically send over the network, you'll
find that determinism and reproducibility are key!
@Glenn: thanks; I hope my practical advice is, um, practical. These
things are a bit hard to fully research (at least for me) β hard to know
how many warts one left uncovered.
"Adding actual parentheses in source code doesnβt help you one bit.
The compiler assumes associativity and may implement the computation in
the order implied by regrouping your parentheses."
Utter and totally wrong! Read up on the C/C++ standards. A
standard-complying compiler (and in this particular issue, they all are,
unless you've opted out using some 'fast-math' compiler option) will not
change the meaning of floating-point code.
E.g. see 5.1.2.3 section 14 of the C (draft) standard.
("Rearrangement for floating-point expressions is often restricted
because of limitations in
precision as well as range. The implementation cannot generally apply
the mathematical associative rules
for addition or multiplication, nor the distributive rule, because of
roundoff error, even in the absence of
overflow and underflow.")
See Annex F for even more detail. Same applies for C++.
Um, I quote Β§5.1.2.3 in the paragraph right after the one you
mention.
Huh? I have no idea what you are trying to say with that reply.
Your original statement "The compiler assumes associativity" is
unequivocally wrong. Floating-point arithmetic is NOT associative!
Therefore no compiler "assumes associativity".
The bit where you say "Adding actual parentheses in source code
doesnβt help you one bit." is also total nonsense. If I have explicitly
added parenthesis to a floating-point expression to control the
evaluation order, the compiler cannot remove these parentheses unless
removing them leaves the expression exactly equivalent *under evaluation
as a floating-point expression*. E.g. a compiler cannot rewrite "((x β
y) + y)" into "x". So adding parenthesis DO matter!
You said "Compilers, or more specifically buggy optimization passes,
assume that floating point numbers can be treated as a field β you know,
associativity, distributivity, the works. This means that a+b+c can be
computed in either the order implied by (a+b)+c or the one implied by
a+(b+c)."
This too is complete nonsense. Floating-point numbers do not have
associativity, and therefore cannot be treated as a field. The two
floating-point expressions "(a+b)+c" and "a+(b+c)" are NOT
equivalent!
I think you need to brush up on floating-point arithmetic a bit. See
the Accuracy Problems section of the wikipedia page
http://en.wikipedia.org/wiki/Floating-point
for an example of how floating-point numbers are not associative.
I said "buggy optimization passes". If you've never
run into a compiler bug of that sort, that's fine, and the fact that you
know that such behavior is illegitimate and therefore is indicative of a
compiler bug is also fine, but both add little to the discussion. The
fact that I knew everything you just said is completely obvious from TFA
to everyone who bothered to R it.
I know this article is 4 years old, but it takes me back 25 years
when I worked on Fortran compilers for early 386/387s. We would dread
the Gold* support call from a research scientist who would report his
program worked perfectly in debug mode, but crashed in release mode. At
Gold* support we knew we were about to do a full numerical analysis of
the code to find where an 80 bit number had gone down one particular
"if" branch of doom when its 64 bit debug alter-ego had gone down the
"else" path of salvation.
25 years later I am the research scientist. I have the advantage of
being forwarned and forearmed, so write code and tests that are
tollarent of such eccentic behaviour. This is the general solution, as I
often write complex/fast/precise algortithms which can I compare to
easy/slow/naive versions. As the number and sequence of FP operations
are quite different the answers will never be exactly the same even if
the IEEE fp and the compiler were friendlier, and I don't care provided
they match within my calculated tollerence.
@JC: well, someone like you who is in the tiny minority of people who
actually know what they're doing when they reach for floating point
probably doesn't care about things like consistent behavior between
debug and release...
It is highly worth mentioning that this is an incredibly annoying
problem for game developers. Eg. multiplayer real-time strategy games
often depend on their simulations working precisely the same on all
machines. If they don't, it is a synchronization error and the ongoing
game is cancelled.
This means that for a multiplatform game, one is practically forced
to use software floating or fixed point for everything that can affect
the outcome of the game, diminishing the available performance an order
of magnitude lower than a predictable hardware implementation would
allow.
I wrote the article a long time ago, but I think that if you're on
x86, then you can ask for double precision instead of extended and then
ask for SSE to be used for single precision and basically you get what
Sun calls "single/double system" instead of an "extended-based system"
in its manuals and it's deterministic enough; and you ask on all
platforms that the compilers doesn't use fused MACs and such, and you
don't use library functions like log or exp, and you're basically all
set.
I'm not sure this covers it all but it sounds as if this is better
than using emulation; maybe I miss something big making the use of
hardware floating point hopeless.
Incidentally, I'm planning on writing a follow-up on this one; I've
studied the issues in more detail, and I think I'm in a reasonably good
position to argue with some of the ideas behind IEEE. I think this
portability business is just botched up, for one thing (ironically this
is pointed out in an appendix to the paper one of the commenters above
linked to, the piece by Goldberg), and then nans, infs, denormals and
rounding modes all have their very notable drawbacks which I think are
worth discussing.
Great article. I've been looking into IEEE 754, trying to figure out
how to support it in sane way in a high-level language. I've had a lot
of the same thoughts along the way but this goes into much more detail.
Looking forward to the follow-up.
It seems to me that if you are doing things that are critical like
this it might be better just to write it in inline assembly, no?
I dunno, I never prefer inline assembly to anything if there's an
alternative. You can cross-compile it and you can't add printfs to it,
for starters, which are big turnoffs for me...
Precision, accuracy and consistency are ALL different things.
The standard meaning of precision isn't about consistency, but simply
bit-depth, or number of decimal places/significant figures. I agree with
your description of accuracy β actually approximating a sane
representation of the real value, but I'd say that of the two, accuracy
is actually closer in meaning to 'consistency' than 'precision' is.
PI = 3.141592222222234444444444101010101 is highly precise.
So is PI = 8.000000000000000000000000008.
But they are inconsistent and neither is accurate, to the extent that
the second one looks like it was measured experimentally by someone
living in a centrifuge with a tangential velocity approaching the speed
of light... or by a drunk monkey.
So the term 'over-precise' is commonly used to refer to a number
whose precision is not _justified_by_its_accuracy_. Why quote 7 more
digits beyond the first inaccurate one?
A good example comes from our good friend Commander Data. He's
regularly over-precise, most commonly by quoting durations to within far
less than the time it takes to say the sentence.
I actually prefer some inconsistency in order to test the rigour of
my algorithms. I do not suffer from errors showing up in a hard-to-debug
environment, while not appearing in an easy-to-debug environment. If my
algorithm isn't tolerant of error, I find that symptoms of some kind
will show up on every platform. In other words I tend to agree with an
earlier comment, "If floating-point error has any significant effect on
the result, it would tend to indicate a poorly-conditioned numerical
algorithm that should not be used anyway."
HOWEVER... I don't mean to answer a "How can we?" with a "What's the
point?" I hate it when people do!
I can imagine situations where consistency would be highly desirable,
for the debugging reasons you suggest. It IS an interesting problem, to
try to achieve cross-platform consistency for every result. Applications
in both chaos theory and rendered-on-the-fly machinima spring to mind.
And especially chaotic machinima :)
What I AM saying is that I think it's desirable more rarely than you
suggest, and that if you're spending many afternoons trying to achieve
floating point consistency, then perhaps your algorithms aren't tolerant
and rigorous enough, and you need to approach a lot of things with a
fundamentally different mindset.
____________________
I also think there's a lot more cross-over between camps 2 and 3 than
you've implied. The physics of 3D computer games will remain extremely
flaky compared to Box2D, for a while yet, because of the deep
computational problems that arise. But we will get there, thanks to the
large, keen, oft-bespectacled crossover between camps 2 and 3!
Is developing a limited, performance optimised CAS library ivory
tower stuff? With rational and surd data types, perhaps? Algebraic
manipulation of multi-dimensional polynomial expressions? Of implicit
equations based on said expressions? To many people these things might
sound abstract and high-end, but to me it sounds like the underpinnings
of a damn fine NURBS renderer, or a game physics engine to be reckoned
with.
(I'm not talking Mathematica or TI-89 levels of versatility here. I
mean the kind of limited, optimised CAS that helps you render one curved
surface much more quickly than a million flat ones. Pixar's technique of
tesselating curves into micro-polygons, as required for a particular
camera position, has already gone way beyond the point where a limited
CAS could be faster.)
____________________
I do like your article, and it's impressive that it was the Box2D FAQ
that led me to it. It's thoughtful and thought provoking, reveals issues
that hadn't occurred to me, and for me at least, it scares just the
right amount with heavy programming jargon! Keeps me on my toes!
Generally, everyone here seems to know more about computer
architecture than I do. I am a mathematician, naturally concerned with
cosmic truths, and not naturally that interested in the parochial
specifics of 21st century Earth computers. I try hard to correct this
imbalance, so that I can earn money, and so I can use my mathematical
and algorithmic cleverness to program graphics and physics. In general
learning ability, I feel as overshadowed here as I would at a conference
of chemical engineers. I wish I were better at what Ernest Rutherford
derided as 'stamp collecting'; I'm a mental gymnast, but an
ignoramus.
Thankyou for helping me to change that, here and there.
Cheers,
Tom
JC is an interesting case of a thoughtful, intelligent commenter, who
has somehow become a research scientist without being able to spell
tolerance! What mad mixtures we are.
Hypocrisy? Nay sir! If my owne spelling of summe wordes seemeth
strange, it be becauseth I amme fromme the olde worlde :)
________________
Also, good comments there from Amateur Gamedev and Glenn Fiedler,
(and probably others) about multiplayer gaming. I suppose that's a more
normal thing to mention than my thing about rendered-on-the-fly
machinima! I meant the term to include multiplayer gaming, but I'm not
sure that's obvious.
Synchronisation problems are generally quite easy to do badly, and
rarely done well. I can imagine a lot of developers tripping here, and
even so-called 'AAA' games having problems with it. I still don't quite
believe Amateur Gamedev, however, regarding "...one is practically
forced to use software floating or fixed point for everything that can
affect the outcome of the game, diminishing the available performance an
order of magnitude lower..." Of course I like to think that I'M clever
enough to get around it β and practically at that!
Sorry to be vague here with 'variations on the theme of' but it's a
big subject:
Variations on the theme of one authoritative game server to correct
erroneous values on less authoritative machines can surely do the trick;
if algorithms are reasonably error-tolerant between synchronisations,
corrections can surely be imperceptible. I'm speaking quite generally,
and specifics of this approach would vary, going way beyond the scope of
this conversation.
________________
Here's a potentially good one to add (unless someone already said it,
and I missed it...)
Has anyone experienced the pain of inconsistent floats with a
distributed computing project? For instance, what about the
reproducibility of a particularly significant result in scientific
calculations? I can imagine even a Monte Carlo method storing/reporting
any list of random numbers it used, or at least the list's seed. Thus,
for instance, the algorithm producing a claimed low energy protein
conformation could be independently reproduced and analysed for whatever
reason; the same goes if we're rounding at random throughout a long
calculation, to decrease systematic rounding errors, (an example of a
difference between accuracy and consistency.) But such checks could be
scuppered by inconsistent floats among various platforms. To my mind,
that's potentially much harder to solve than the game synchronisation
problem β if it's a problem that needs solving, that is. I don't know;
I'm not a Rosetta@HOME developer!
________________
How does all this CPU discussion pertain to GPUs? Are they more, or
less, consistent in their floating point behaviour?
________________
Fast, accurate emulation is another area that occurs to me as one
that could do with hardware-based, yet consistent, floating point
behaviour. Any experience of that, anyone?
________________
And finally, sorry I can't be more help on the answer to how one
achieves more consistency. Hopefully my questions are worthwhile though,
and hopefully I've done something to justify my earlier point that it
feels bad to answer ANY "How do we do this?" with a simple "Why?" or a
"But you don't want to!" We already have a few of those answers! They're
well justified and probably true as far as they go, but they're a bit
unsatisfying all the same. Even the VERY wise cannot see all ends,
Frodo!
Regards,
Tom
P.S. Another example of on-the-fly machinima would be multi-angle
game replays, most commonly seen in racing games. The Super Monkey Ball
games make small errors here, but only with banana collection so far as
I've noticed. That's running on the same platform again of course, so
the floating point pain in not the cause, but we can imagine a game
saving replays, then having problems after a hardware or O/S upgrade, or
problems with sharing such replays online. Perhaps consoles seem to do
that sort of thing more, because it's more feasible with a consistent
platform?
And for this problem, my 'variation on the theme' of an authoritative
game server would be a pre-computed set of authoritative simulation
frames.
Post a comment