Story URL: http://news.medill.northwestern.edu/chicago/news.aspx?id=133011
Story Retrieval Date: 9/23/2014 9:22:58 AM CST
In a pale, nondescript warehouse about 30 miles outside Chicago, Bill Allcock’s booming voice rises above the mechanical drone around a custom-designed building-within-a-building that houses the fifth-fastest supercomputer in the world — a machine called “Intrepid.”
Intrepid is an IBM Blue Gene/P supercomputer: a huge, ultra-powerful machine used by scientists and researchers to model complex systems with a degree of precision never before achieved. Built over five years, at a cost of $77 million, Intrepid exemplifies the state of the art in supercomputing. In this realm, speed isn’t measured in gigahertz but rather in floating-point operations, or flops. Intrepid provides half a petaflop of computation capacity, or 500 trillion flops. A poster on the wall says it would take two stacks of MacBook laptops, each as tall as the Sears Tower, to match Intrepid’s power.
“This is a 50,000 cubic-foot-per-minute air handler,” says Allcock, the operations director for the Argonne Leadership Computing Facility. “You see these typically on skyscrapers — they are the things that move the ventilation inside skyscrapers. We have six of them, so we move 300,000 cubic-feet-per-minute of air through this room.” Moving all that air is key to keeping Intrepid’s 160,000 cores, or central processing units, from overheating under the load of the advanced scientific calculations being carried out 24 hours a day.
And Intrepid is just one of the systems at Argonne National Labs, which is positioned at the cutting edge of computing science in the 21st century.
“This is one of two leadership computing facilities,” said Katherine Riley, a scientific applications engineer at Argonne. “The concept behind the leadership computing facilities was to really keep supremacy in the world in high-performance computing.”
Beyond Intrepid, the computing facilities at Argonne contribute a multitude of other important resources to the scientific community. Advanced visualization systems developed at Argonne help researchers explore the structure of proteins and understand the mechanics of supernovae — exploding stars. Software, such as the Access Grid, allows collaborators all over the world to interact in real time using sophisticated video conferencing. The open-source Nimbus software lets scientists and businesses summon supercomputers on demand and on the cheap. And the collective experience of years in the high-performance computing field feeds back into the development of next-generation systems that will dwarf today’s supercomputers in speed and efficiency.
Intrepid is the largest installation of IBM’s Blue Gene architecture to date, though Lawrence Livermore National Laboratory in Livermore, Calif., is acquiring one of the same size. Access to the machine is limited to a pool of projects the Department of Energy has vetted for scientific significance. Some of the projects using the Blue Gene supercomputer this year include an experiment by David Baker, of the University of Washington, to predict protein structures; one by Paul Fischer, an Argonne researcher, examining fluid flow inside of nuclear reactor cores; and a model by Don Lamb, of the University of Chicago, simulating how stars self-destruct through supernovae.
But that much virtual brainpower doesn’t come easy, and it entails a host of complicated considerations. Power and heat are two of the most crucial. The densely packed microprocessors generate enormous amounts of heat that must be vented, and the power requirements of running not just the computer but all the ancillary systems such as cooling, network and storage are demanding. Intrepid draws 1.2 megawatts of power, and its storage systems run on an additional half-megawatt.
To overcome these challenges, Argonne scientists have begun optimizing when software runs based on the heat generated.
“Certain codes run hotter than other codes,” said Mike Papka, a deputy associate laboratory director at Argonne. He said a code that “grinds” a processor — pushes it to full capacity, that is — might run, say, 20 degrees hotter than one which grinds only intermittently.
Papka demonstrated how Argonne scientists monitor the Blue Gene’s heat output using an array of 15 video projectors to visualize the temperature of all 160,000 cores. The projectors are linked into a display system called the Active Mural, also developed at Argonne, that allows an observer to check the status of every one of those processors at a glance. Papka said the novel interface is essential to running a computer as powerful as Intrepid.
“You couldn’t look at this on your desktop,” he said.
Just as important as visualization is collaboration, and Argonne leads the way on that front, as well. The Access Grid developed there provides a framework for digital collaboration around the world by building on the notion of video conferencing.
“It kind of just looks like your living room,” said Papka, showing off a comfortable space furnished with a couch and coffee table. “One kind of departure from your traditional videoconferencing is lots of cameras.” He explained that the Access Grid utilizes four cameras — including one mounted on the ceiling — all of which can pan, tilt and zoom to provide a sense of depth and perspective.
“You’ll have multiple views up of all the other sites, and we’ll have somewhere between four and 20, maybe as many as 30 video streams going at a time,” he said.
Like most Argonne software, the program underlying the Access Grid is open source, meaning anyone can download it, use it and modify it for his or her own purposes. Open-source software allows everyone to reap the benefits of the work pioneered at Argonne, a taxpayer-supported facility.
“You guys already paid for it once,” Papka said. “We shouldn’t charge for it again.”
Also on the roster of open-source products out of Argonne is Nimbus, software that allows anyone to rig a virtual supercomputer at low cost in only a few minutes. Instead of relying on a single massive machine such the Blue Gene, the Nimbus software takes advantage of large aggregations of computer resources known as “clouds.”
In the traditional “grid-computing” model, a single super-machine runs on a cluster of locally networked computers. By contrast, cloud computing can pool the processing power, memory and storage space of thousands of remote computers and allow users to run “virtual machines.”
Cloud computing is also known as “Infrastructure-as-a-Service” because it commodifies the cost of large-scale computing resources. Amazon recently introduced this concept to the commercial market as a for-profit service called the Amazon Elastic Compute Cloud.
Nimbus, on the other hand, is aimed primarily at scientific users, who can tap into a number of science-related clouds, free of charge. That service grants universities and researchers around the world access to supercomputing power that may previously have been beyond their reach.
Commodifying infrastructure is gaining importance in the world of high-performance computing, and the idea may point to a possible way forward.
“One of the ideas that is sort of bouncing around in a lot of these conversations is: In five to 10 years, maybe you’re not charging people by how many CPU hours they’re using but by how much power they’re using,” said Kate Keahey, who heads up the Nimbus project at Argonne.
Allcock offered a similar assessment.
“Five years ago, nobody thought twice about power,” he said. “That is a huge sea change. Five years ago, people were just starting to get an inkling that we can’t keep going this way forever, and nobody thought about what their power budget was.”
But he said the exponential relationship between computing capacity and power usage had reoriented the research away from faster chips and toward more efficient clustering of resources.
“In the future of supercomputing, [electrical] power is everything,” he said. “What we are planning toward is a machine that’s 40 times as powerful as [Intrepid] in about twice the space, and [with] about three times the power consumption,” Allcock said.
But Allcock and Papka stressed that significant problems remaining before computing tasks can be spread with optimal efficiency across large numbers of processors.
“We have ideas,” Papka said. “We don’t have the answers. That’s why it’s a research problem.”