I wouldn't completely discount them yet. First, while absolute performance is probably less than i7 in most benchmarks, their main performance metric is GFLOPS/watt, not just pure GFLOPS. When you start to include the overhead of managing a large system, which includes UPS, heat, and power draw, the concept seems more cost competitive. Also, this has applications for consumer applications with more compute intensive algorithms, you couldn't put an i7 on a phone. Second, not all parallelism tasks run most efficiently on GPUs. GPUs get incredible performance but assume very predictable memory access patterns. One big limitation of GPUs (at least with nvidia) is you can't easily share data between streaming multiprocessors (SM) in a coordinated manner (you can within each SM though) without going to device memory. Also, the coordination primitives for device memory reduce performance dramatically. If you start adding more hardware support for more general memory access patterns, the performance should improve.
There's a little bit more details of the hardware architecture here: http://www.adapteva.com/products/silicon-devices/e64g401/ Seems pretty reasonable. Nothing out of the ordinary, they just made different decisions on what to prioritize. Just skimming, it looks like you should be able to get lower latency between shared data across compute nodes when compared to CUDA's architecture. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en