GPU Computing
Saturday, November 29th, 2008 at 9:09PM PSTYou’ve no doubt heard of the phenomenal computational capabilities of modern graphics cards. For problems in certain domains, their performance can be rather extraordinary, and distributed computing is one field that is attempting to take advantage of it. Folding@Home has been testing a GPU-based client for some time, and over at GPUGRID has been working to bring the enhancements to a BOINC project.
However, a few days ago, the good folks at distributed.net released the first public beta (for x86-64 Linux) of a GPU client for NVIDIA cards using the CUDA platform. The client currently only supports the RC5-72 project and not OGR, which might be disappointing for some, as the RC5-72 project is of dubious scientific or mathematical value, though there is a prize involved.
In any case, I have been testing the client over the past few days on my NVIDIA 512MB GeForce 8800 GTS and here are my observations:
The thing is ridiculously fast. My Intel Core 2 Quad Q6600 overclocked to 2.97Ghz is already quite fast, but the graphics card completely smokes it. Using all four cores, the Q6600 will average about 800 RC5-72 blocks per day. If utilized all day on my aforementioned graphics card, the CUDA-enabled client would complete approximately 6200 blocks. That’s almost eight times as fast. Granted, these GPUs can only be used in some problem domains, but they clearly have an advantage in those areas where they are well-suited. Nonetheless, there remain a few issues that will prevent me from using the client to its full capacity.
First, the client pegs one of the CPU cores at 100%, meaning you’re trading one CPU for the performance on the GPU. All things considered, this is probably a fair trade, though if you like to run several different projects, it’s more limiting. (RC5-72 probably isn’t interesting enough to me to run on my CPUs at the moment, though that can change at any moment.) At GPUGRID, they seem to have been able to reduce the CPU load caused by the client to a few percent on Linux, so this problem may yet be solvable to some degree.
Perhaps more importantly, however, desktop performance becomes quite sluggish while the client is running, to the point of being unusable. Perhaps I bring this upon myself by using maximized applications on a 1920×1200 screen, but it’s an issue nonetheless. A separate test client (using 64 threads instead of 128) made available by one of the authors was slightly better in this regard, but probably not enough for me to tolerate. However, this problem in no way prevents me from running the client while I sleep or am otherwise not at the computer, which can add up to a lot of hours a week. (The past couple of days, I’m averaging about 4000 RC5-72 blocks per day.)
So, while the client is somewhat rough about the edges, the performance enhancements it brings make the whole distributed computing arena about to become much more interesting. My only hope is that other suitable projects begin to adopt the technology. I know not all of them will be able to take advantage at this stage (especially those requiring double-precision floating point math), but it’s still something to look forward to in the future. It’s an exciting time for the technology enthusiasts of the world.