The overblown frequency vs cost efficiency trade-off - comments

Alex Orange

Alex Orange — Fri, 27 Oct 2017 19:50:00 +0000

P.S. By IOPs I meant integer operations/second. Just realized IOPs is I/O not integer ops/second.

Alex Orange

Alex Orange — Fri, 27 Oct 2017 19:47:00 +0000

You seem to be confusing the best rate to run a given circuit at with the most efficient circuit. If you want to get from point A to point B with a car and your choices are a Honda Civic or a McLaren F1, the Civic is certainly going to get you there with less gas, but it can't go as fast as the F1. The Civic will have an optimal speed, and like your argument relative to circuits up to a certain point higher speed will give you higher efficiency. The F1's maximum efficiency speed will likely be higher than the Civic's maximum efficiency speed but it's efficiency will almost certainly be less due to it having a much larger engine then the Civic.

Similarly with circuits, a simple ripple carry adder is going to be excruciatingly slow, but also likely the lowest energy per add. A Kogge-Stone adder is going to be several times faster but will take up something like 5-6x the area and 5-6x the energy per operation. This is all talking about the architecture of the circuit (where to use an AND/NOR/NOT/XOR/etc gate). If you change the circuit type to something like dynamic gates you can speed up some more, but again at the cost of more energy. Almost universally, anything that you do in a given process to speed up an operation will burn more energy unless the original circuit was horribly designed (which they aren't).

By horribly designed I mean absolute mistakes like not using minimum length gates or building very area-inefficient gates. The differences between what's going on inside a CPU and a GPU other than process are going to be architecture and circuit type, not layout. Likely both are going to use custom layouts. The reason GPUs are "slower" is because their computations are MUCH more parallel then a CPU's. Therefore they measure their performance in GFLOPs total whereas a CPU measures its performance in GFLOPs or more often IOPs serial. CPU arithmetic circuits are therefore larger even taking speed into account whereas GPUs are tuned to fit as many operations/second into a given piece of area.

So, in conclusion, your statement of "I've often read arguments that computing circuitry running at a high frequency is inefficient, power-wise or silicon area-wise or both." would be better phrased as "...computing circuitry ***capable of*** running at a high frequency..." In which case the statement that such circuits are power and area inefficient is absolutely true.

Yossi Kreinin

Yossi Kreinin — Sat, 06 Feb 2016 09:25:00 +0000

Both of these are true to some extent, though the SoCs of the last decade have far fewer communication overheads than say the CPU/GPU desktop setup which is always mentioned in these cases, and powering up/down probably doesn't take much more than ~1ms (but then of course some state might be destroyed by it that needs reinitialization, and there might be other costs.)

Johan Ouwerkerk

Johan Ouwerkerk — Sat, 06 Feb 2016 05:09:00 +0000

There's also the fact that a lot of this hardware tend to start out as a simple 'slave' device to a master CPU. So the bottleneck is going to be I/O between the two "domains" and a 'naive' version of your faster accelerator mostly burns these extra cycles waiting for IO to complete.

Also, there's the fact that powering things down to lower clock speed/sleep mode and back up is not a free lunch either. So your higher clock speeds must be so much higher that this overhead in current draw and time is compensated for by the correspondingly greater time spent in low(er) power mode(s).

Yossi Kreinin

Yossi Kreinin — Mon, 01 Feb 2016 09:35:00 +0000

Interesting! I updated the article. (I hope I got it right; I find it's really easy to be stupid about the simple things – forget a 2x here or a 10x there...)

Norman Yarvin

Norman Yarvin — Sun, 31 Jan 2016 20:47:00 +0000

Yes, "super-linear" was what I meant — or, well, I took it for granted that the question was switching losses per amount of work done, in which case it's a simple increase. As for hard numbers, I didn't have any in my head, but a search finds this report on some explorations that Intel did where they were able to make a Pentium that could run at as little as 2 milliwatts (though only at 3 MHz; the optimum was at more like 17 milliwatts and 100 MHz):

http://www.realworldtech.com/near-threshold-voltage/

Yossi Kreinin

Yossi Kreinin — Sun, 31 Jan 2016 11:40:00 +0000

Yeah – maybe I should have said plainly that accelerators accelerate, even if it's 50x instead of 100x; that's kinda what I meant by my vague "other architectural improvements." That's why it makes sense to leave that last, hard 2x for the next time.

Dan Luu

Dan Luu — Sun, 31 Jan 2016 09:03:00 +0000

> So AFAIK this is why so many embedded accelerators had crummy frequencies when they started out (and they also had apologists explaining why it was a good thing). And that's why some of the accelerators caught up – basically it was never a technical limitation but an economic problem of where to spend effort, and changing circumstances caused effort to be invested into improving frequency.

This also matches my experience with non-embedded accelerators. If you're looking at (just for example) a 100x speedup, it's not so bad to target a less aggressive clock rate and take a 50x speedup with v1, which sharply reduces risk and eases schedule pressure. If that works out, then pull out all the stops for v2 or even v3.

Yossi Kreinin

Yossi Kreinin — Sun, 31 Jan 2016 08:44:00 +0000

One more thing is, if you're feeding off a battery and/or have trouble dissipating heat, it's beneficial to lower your frequency as much as you can lower it without the throughput falling below the threshold of acceptability – even if you can't also lower the voltage. That way, you get linear gains in switching power instead of super-linear, but in absolute terms, battery life is up and heat is down. This wouldn't be so if processors were powered down every time they finish the current bulk of work, but they aren't – in practice, waiting for the user involves a lot of non-productive switching activity and you save energy by doing this stuff slower.

The upshot is that we should see some processors in the field lowering their frequency to a much lower level than they would if all they pursued was a lower voltage.

Yossi Kreinin

Yossi Kreinin — Sun, 31 Jan 2016 08:15:00 +0000

I guess you mean that's where some of the super-linear (so cost-inefficient) increased switching losses come from (if you're increasing frequency and keeping the voltage, switching costs per unit of time also increase, but they increase proportionately to the amount of work done per unit of time so it's neutral efficiency-wise.)

And still – (1) at what frequency does it typically become necessary to increase the voltage, and (2) how much less cost-efficient is the circuit because of being able to reach a higher frequency at a higher voltage? AFAIK the answer to (1) is "pretty high" and the answer to (2) is "not much." Even when the answer to (1) is "pretty low", it means that you could beneficially make your circuit work at a higher frequency for those times where it's needed without losing much cost efficiency, and you chose not to do it because there wasn't much to gain by speeding up those rare/non-existent bursts of extraordinarily intensive, urgent work. So my main point would remain, namely, if your peak supported frequency is pretty low, it's not because supporting a higher peak frequency would result in a worse design, but because it was uneconomical given your schedule, development budget and use case. If design effort was free and everything else were kept constant, you'd probably do it.

But it is interesting how with all that said, essentially to the extent that you can lower power dissipation by lowering the frequency and voltage, you're trading silicon area for power and these are two pretty different variables (they're costs paid at different times and circumstances.) So I wonder how pronounced this effect is if you plot it – how low can you go frequency-wise and still gain something (I never experimented very much with it for various reasons – I probably would if I were in the cellphone processor market, for instance.)

Norman Yarvin

Norman Yarvin — Sun, 31 Jan 2016 04:19:00 +0000

To get the circuit to work at a higher frequency, you often have to increase the voltage. That's where the increased switching losses come from; for those, power goes as voltage squared. Increasing the voltage also increases leakage losses, but I'm not sure how those scale.

Many CPUs these days do actually change their voltage as they change their frequency, and for exactly this reason. Transmeta, I believe, pioneered this; although they're defunct, others have picked it up.