Shirky: Grid Supercomputing: The Next Push

Clay Shirky’s Writings About the Internet.
Grid Supercomputing: The Next Push.
Grid Computing is, according to the Grid Information Centre a way to “. enable the sharing, selection, and aggregation of a wide variety of geographically distributed computational resources.” It is, in other words, an attempt to make Sun’s famous pronouncement “The Network Is The Computer” an even more workable proposition. (It is also an instantiation of several of the patterns of decentralization that used to travel together under the name peer-to-peer.)
Despite the potential generality of the Grid, most of the public pronouncements are focusing on the use of Grids for supercomputing. IBM defines it more narrowly: Grid Computing is “. applying resources from many computers in a network-at the same time-to a single problem” , and the MIT Technology Review equated Grid technology with supercomputing on tap when it named Grids one of “Ten Technologies That Will Change the World.”
This view is wrong. Supercomputing on tap won’t live up to to this change-the-world billing, because computation isn’t a terribly important part of what people do with computers. This is a lesson we learned with PCs, and it looks like we will be relearning it with Grids.
The Misnomer of the Personal Computer.
Though most computational power lives on the world’s hundreds of millions of PCs, most PCs are not used for computation most of the time. There are two reasons for this, both of which are bad news for predictions of a supercomputing revolution. The first is simply that most people are not sitting at their computer for most hours of the day. The second is because even when users are at their computers, they are not tackling computationally hard problems, and especially not ones that require batch processing — submit question today, get answer tomorrow (or next week.) Indeed, whenever users encounter anything that feels even marginally like batch processing — a spreadsheet that takes seconds to sort, a Photoshop file that takes a minute to render — they begin hankering for a new PC, because they care about peak performance, not total number of cycles available over time. The only time the average PC performs any challenging calculations is rendering the visual for The Sims or WarCraft.
Therein lies the conundrum of the Grid-as-supercomputer: the oversupply of cycles the Grid relies on exists because of a lack of demand. PCs are used as many things — file cabinets and communications terminals and typewriters and photo albums and jukeboxes — before they are used as literal computers. If most users had batch applications they were willing to wait for even as long as overnight, the first place they would look for spare cycles would be on their own machines, not on some remote distributed supercomputer. Simply running their own PC round the clock would offer a 10x to 20x improvement, using hardware they already own.
If users needed Grid-like power, the Grid itself wouldn’t work, because the unused cycles the Grid is going to aggregate wouldn’t exist. Of all the patterns supported by decentralization, from file-sharing to real-time collaboration to supercomputing, supercomputing is the least general.
The Parallel with Push.
There is a parallel between Grids and Push technology, that glamorous flameout of the mid-90s. The idea behind Push, exemplified by the data-displaying screensaver Pointcast, was that because users suffered from limited bandwidth and periodic disconnection (e.g. laptops on airplanes), they would sign up to have data pushed to them, which they could then experience at their leisure. This, we were told, would create a revolution in the way people use the internet. (This notion reached its apotheosis in a Wired magazine cover story, “Push!”, whose subtitle read “Kiss your browser goodbye: The radical future of media beyond the Web”.)
As it turned out, user’s response to poor connectivity was to agitate for better connectivity, because like CPUs, users want bandwidth that provides good peak performance, even if that means most of it gets “wasted.” Shortly after the Wired cover, it was PointCast we kissed goodbye.
Push’s collapse was made all the more spectacular because of its name. The label Push seemed to suggest a sweeping new pattern of great importance. Had the technology been given a duller but more descriptive name, like “forward caching,” it would have generated much less interest in the beginning, but might also not have been so prematurely consigned to the list of failed technologies.
Forward caching is in fact a key part of some applications. In particular, companies building decentralized groupware like Groove , Kubi Software, and Shinkuro, all of whom use forward caching of shared files to overcome the difficulties caused by limited bandwidth and partially disconnected nodes, just the issues Push was supposed to address. By pushing the name Push, the Pointcast’s of the world made it harder to see that though forward caching was not universally important, it was still valuable in some areas.
Distributed Batch Processing.
So it is with Grids. The evocative name suggests that computation is so critical that we must have a global infrastructure to provide all those cycles we’ll be needing next time our boss asks us to model an earthquake, or we have to help our parents crack a cryptographic key. The broadness of the term masks the specialised nature of the technology, which should probably be called “distributed batch processing.”
Like forward caching, distributed batch processing is useful in a handful of areas. The SETI@Home project runs on distributed batch processing, as does the distributed.net cryptographic key-breaking tool. The sequencing of the SARS virus happened using distributed batch processing. Distributed batch processing could be useful in fields like game theory, where scenarios could be exhaustively tested on the cheap, or animated film, where small studios or even individuals could afford acces to Pixar-like render farms.
Distributed batch processing is real progress for people who need supercomputing power, but having supercomputing on tap doesn’t make you a researcher anymore than having surfboard wax on tap would make you a surfer. Indeed, to the consternation of chip manufacturers (and the delight of researchers who want cheap cycles), people don’t even have much real use for the computational power on the machines they buy today.
History has not been kind to business predictions based on an undersupply of cycles, and the business case for selling access to supercomputing on tap is grim. Assuming that a $750 machine with a 2 gigahertz chip can be used for 3 years, commodity compute time now costs roughly a penny a gigahertz/hour. If Grid access costs more than a penny a ghz/hr, building a dedicated supercomputer starts to be an economical proposition, relative to buying cycles from a Grid. (And of course Moore’s Law sees to it that these economics get more adverse every year.)
Most of the for-profit work on supercomputing Grids will be in helping businesses harness their employees’ PCs so that the CFO can close the books quickly — cheap, one-shot contracts, in other words, that mostly displace money from the purchase of new servers. The cost savings for the average business will be nice of course, but saving money by deferring server purchases is hardly a revolution.
People Matter More Than Machines.
We have historically overestimated the value of connecting machines to one another, and underestimated the value of connecting people, and by emphasizing supercomputing on tap, the proponents of Grids are making that classic mistake anew. During the last great age of batch processing, the ARPAnet’s designers imagined that the nascent network would be useful as a way of providing researchers access to batch processing at remote locations. This was wrong, for two reasons: first, it turned out researchers were far more interested in getting their own institutions to buy computers they could use locally than in using remote batch processing, and Moore’s Law made that possible as time passed. Next, once email was ported to the network, it became a far more important part of the ARPAnet backbone than batch processing was. Then as now, access to computing power mattered less to the average network user than access to one another.
Though Sun was incredibly prescient in declaring “The Network is the Computer” at a time when PCs didn’t even ship with built-in modems, the phrase is false in some important ways — a network is a different kind of thing than a computer. As long ago as 1968, J.R. Licklider predicted that computers would one day be more important as devices of communication than of computation, a prediction that came true when email overtook the spreadsheet as the core application driving PC purchases.
What was true of the individual PC is true of the network as well — changes in computational power are nice, but changes in communications power are profound. As we learned with Push, an intriguing name is no substitute for general usefulness. Networks are most important as ways of linking unevenly distributed resources — I know something you don’t know; you have something I don’t have — and Grid technology will achieve general importance to the degree that it supports those kinds of patterns. The network applications that let us communicate and share in heterogeneous environments, from email to Kazaa, are far more important uses of the network than making all the underlying computers behave as a single supercomputer.