Peak CPU and the next computing revolution

Forget Big Data. Here is the Next Big Thing in computing.

First, the central fact: We are hitting the physical limits of processor speed.

This ought to reset your expectations of future technological progress. You already intuitively understand Moore’s Law (computing power roughly doubles every two years): When you had your first mobile phone, you couldn’t conceive of the iPhone. It’s like we live on two different time-scales, the social one and the technological one. Perhaps you remember many things very clearly before 1988, but do you remember what studying, working, shopping, networking was really like before the internet? Also, 1998 doesn’t seem ‘way back when’, but in terms of computing it was a very different world from today: Home-PCs were running at a clock-speed of around 133MHz, using some 100-200W of electric power. The iPhone’s chip today is ten times faster, a hundred times smaller, and needs a thousand times less power.

That progress is unlikely to repeat itself, not just at that pace, no: never in your life-time. With any technology conceivable today, not even a super-computer’s chip will ever be 10x faster than your iPhone’s chip. Well, to be precise: No processing core will ever run at 13GHz. You will never be able to do 13 billion computations a second (10x more than your iPhone) on a single thread. Unless some computations can run simultaneously, you are hitting a wall.

We should have seen it coming: Pat Gelsinger of Intel delivered a keynote speech in 2001, saying that if Moore’s law continued and chips continued to be based on their then current design path, they would be as hot as a nuclear reactor by 2010, and as hot as the surface of the sun by 2015. In the future, he said, “performance would have to come from simultaneous multi-threading”, possibly supported by putting multiple cores on a single chip.

This is what happened. Today we have dual-core processors in mobiles and laptops, and quad, six-core, even twelve-core processors in PCs. However, most current software does not use these multiple cores efficiently, and most of the cores sit idle most of the time. If you need the speed, you are caught between a rock and a hard place: invest in very marginal increases of clock-speed (which means purchase price, power consumption, heat generation, and weight of the cooling components will sky-rocket), or invest in more parallel cores (cheaper, cooler, and lighter, but most software will let this added resource go to waste).

But if you have the full power of parallelisation in your software, the news are literally 100 times better than the multi-core CPUs would make you think: Your computer’s GPU (graphics processing unit) has probably 100 times more cores than its CPU.

Why is a ‘gaming’ PC so much more powerful than a ‘standard’ PC. The difference is that it is designed mostly for fast graphics rendering (shading, rotations, spatial translation, reflections, representations of surface textures, etc.), which are both computationally demanding and easily parallelisable. A lot of the gaming PC’s price-tag derives from a powerful, massively parallel graphics card (GPU)). For a smallish increase in the PC’s cost, you literally get hundreds of parallel processing cores on the GPU. Their cost is so cheap because they are about four times slower (in terms of clock cycles per second) than the fastest CPU. Operating this far from the ‘bleeding edge’ saves a lot of money, energy (literally), heat-loss, and weight. NVIDIA, a GPU designer, has made their general-purpose device programming language (CUDA) public in 2008. Since then, graphics cards architectures have spawned derivatives called high-performance computing (HPC) cards. I just built myself a machine with two such cards in it: It can do seven trillion floating point operations a second; It has nearly 5,000 cores (slow, therefore ‘cool’ and cheap ones, but hundreds of times more in number than a multi-core CPU will sport), and it is seven times faster than the Cray T3D in Edinburgh, which was the second-fastest computer in the world 20 years ago (I ran some particle physics research on it at the time, fun days!). For things that really are parallelised, it can be 100-150 times faster than a very fast PC, at not even twice the cost. Per core, the chip costs less than a dollar, and yet no amount of money will ever in your life-time buy you a core 10x faster than just one of those.

So why are consumers buying super-fast multi-core CPUs at a cost of a thousand dollars or more? It’s because software engineers aren’t yet good at making two or more things run at the same time and then making these ‘threads’ come together nicely at the same time too.  Even where things are parallelised, cores often sit idly, waiting for the slowest thread to finish.

So here is my prediction:

We will see a massive shift in IT development resources, away from hardware towards software.

Everything that can be parallelised will be parallelised. When I can buy a 50cc engine for 1/10,000th of the price of a five-litre engine, it really pays to start working on making smaller engines work together. Similarly, when hardware parallelisation is hundreds of times cheaper than hardware acceleration, Moore’s Law has to break down in terms of clock speed: We will use our IT brainpower for working out ways of making more cores work in parallel. This is what is required to keep Moore’s law in place for both the dollar-cost and the energy-cost of any given computing task: They will continue to drop dramatically from where things stand today. Clock speed will not change, and we may even see a trend downwards once the right software tools are widely available.

We have long crossed that milestone towards near-astronomic price differential between fast and slow hardware in terms of clock-speed. Even consumer CPUs are operating at the outer limits of what’s possible, with extremely diminishing returns. A 1 GHz GPU core costs less than a dollar. A 2.0 GHz CPU and a 2.6 GHz CPU have a price differential of $300. And a 3.3GHz CPU is more expensive by around $1000 still. So we will see dollar and energy costs of computing retreat massively, but only once the software has caught up with the new paradigm.


P.S. (for purists only): Three clarifications for purists only, on the role of quantum computing, on Big Data, and on the role of the ‘host-CPU’ thread. We start with the latter:

(I) Current computer architecture, even with massive parallelisation, forces strict serialisation of those tasks that form part of the central command structure ‘running’ the computer. There is a difference between ‘doing’ computations (which may be parallelisable) and ‘running’ the computer (in the sense of coordinating all parallel threads, which is ultimately not parallelisable). There is always an ‘Eve-thread’ that controls all the tasks running on other cores. Control can be indirect (Eve may be a grand-parent, as it were), but at the root of the tree is always one Eve. This is not a theoretical requirement. It’s just how computer architecture has evolved and how it is now available to us commercially. Moving away from this paradigm, however, will narrow down the crowd of available software engineers even further. Genetic algorithms, genetic programming, and neural networks are all examples of paradigms where this is not necessary. Cellular automata, and the mammal brain are other examples: So one clearly can contemplate computing in a completely decentralised command structure. But until we do that, the mother-thread will always put a limit on things, there has to be a CPU (as in cental processing unit), and it can’t run faster than at the 5-10GHz hardware limit. The human brain is the complete opposite: It is extremely de-centralised, the ‘clock speed’ of neurons is well below an abysmal 200 Hz, 10 million times slower than a CPU. If one neutron (or even a relatively small group of neurons taken together) were to control all the others, directly or indirectly, the others would spend most of their lives sitting idly, waiting for instructions.

(II) Quantum computing has great promise in theory, but even the theory goes only as far as those tasks that can aleady be massively parallelised in a standard fashion, for example, cracking RSA encryption. No-one can envisage at this stage, even in theory, a computer that uses quantum computing as the ‘command’ structure (irrespective of whether the command is centralised or not). So even if quantum computing were a technical feasibility today, it would only ‘do computations’, and not ‘run the computer’. The ‘Eve-thread’ problem of item (I) above remains.

(III) Big Data (I insist that you do yourself the favor and read the Wikipedia definition; it’s the only well-written one I can find out there in the whole wide www), although definitely an over-used buzz-word, is also the most promising area in regard of all of the above, as it is full of tasks that can be parallelised: It’s easier to look for a needle in a haystack when 50,000 people are dividing up the haystack first, and it generally doesn’t matter in which order you search through the sub-portions of the stack. Searching is the easy one. Sorting is already much harder to parallelise in practice. Try this thought experiment: Give three people instructions to sort one stack of cards in such a way that together they are faster than one would be alone. Would the instructions differ if the cards aren’t standard playing cards, but cards with nothing but random numbers on them? Would it differ again if we knew something (rather than nothing) about the statistics of these numbers (like their mean and standard deviation)? Efficiency of sorting algorithms differs greatly, depending on the prior knowledge you have of the statistics of the things you are trying to sort. Again, not a problem that a typical software engineer is grappling with in their daily life.

One response to “Peak CPU and the next computing revolution

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s