I wouldn't completely discount them yet. First, while absolute performance 
is probably less than i7 in most benchmarks, their main performance metric 
is GFLOPS/watt, not just pure GFLOPS. When you start to include the 
overhead of managing a large system, which includes UPS, heat, and power 
draw, the concept seems more cost competitive. Also, this has applications 
for consumer applications with more compute intensive algorithms, you 
couldn't put an i7 on a phone. Second, not all parallelism tasks run most 
efficiently on GPUs. GPUs get incredible performance but assume very 
predictable memory access patterns. One big limitation of GPUs (at least 
with nvidia) is you can't easily share data between streaming 
multiprocessors (SM) in a coordinated manner (you can within each SM 
though) without going to device memory. Also, the coordination primitives 
for device memory reduce performance dramatically. If you start adding more 
hardware support for more general memory access patterns, the performance 
should improve.

There's a little bit more details of the hardware architecture here:
http://www.adapteva.com/products/silicon-devices/e64g401/

Seems pretty reasonable. Nothing out of the ordinary, they just made 
different decisions on what to prioritize. Just skimming, it looks like you 
should be able to get lower latency between shared data across compute 
nodes when compared to CUDA's architecture.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to