Hi, >> Beyond that, there may be a few more things that can be done to make R run >> "stupidly fast" on ps3 or IBM Cell blades. >> >> > > Wouldn't the right way to go here be to make it use the PS3 graphics > hardware, in a http://www.gpgpu.org/ kind of way? Or are the Cell > processors on the PS3 graphics processors too? > The Cell can be thought as a mini cluster on a chip. It uses messaging along the lines of the way one might program a distributed application using MPI, or organize a program for remote procedure calls. However the latency is about 5 nanoseconds instead of 5 microseconds (as one might hope to get with typical high performance networking fabrics). Applications for the Cell are typically multithreaded on the PPU controlling synchronous, non-timeshared activity on the SPUs.
The PS3 primary processor, the Celll PPU, is on the same silicon as the coprocessors, the SPUs. The PPU is basically like what's in a Mac G5. The SPUs have less smarts in terms of out of order execution lookahead and they have only 256K local store. Messaging is done using a DMA controller, and there are some C routines for that. Newer versions of the GNU toolchain have an overlay manager so that the 256K localstore can be automatically managed for the most part. The SPUs instruction set is new, but GCC has a cross compiler that works fine for it. I think a lot of the work to make R take advantage of the Cell would be pretty general, e.g. localizing the scope of operations and making operations multithreaded... It's desirable to keep computations on the local store as much as possible, but it doesn't seem to be crucial. The messaging is extremely fast. It's almost like worrying about processor affinity on a SMP system. Also, there's more need to profile and optimize the operations on the SPUs as they are dumb compared to a modern microprocessor (e.g. by explicit prefetching) One nuisance with the PS3 itself (as opposed to IBM blades) is the limited RAM. There's only 256MB. It's pretty painful to bootstrap GCC, for example. The RAM itself (Rambus XDR) is several times faster latency-wise than DDR2. Marcus ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel