600s for Spark vs 5s for Redshift...The numbers look much different from the amplab benchmark...
https://amplab.cs.berkeley.edu/benchmark/ Is it like SSDs or something that's helping redshift or the whole data is in memory when you run the query ? Could you publish the query ? Also after spark-sql are we planning to add spark-sql runtimes in the amplab benchmark as well ? On Sun, Jun 22, 2014 at 9:13 AM, Toby Douglass <t...@avocet.io> wrote: > I've just benchmarked Spark and Impala. Same data (in s3), same query, > same cluster. > > Impala has a long load time, since it cannot load directly from s3. I > have to create a Hive table on s3, then insert from that to an Impala > table. This takes a long time; Spark took about 600s for the query, Impala > 250s, but Impala required 6k seconds to load data from s3. If you're going > to go the long-initial-load-then-quick-queries route, go for Redshift. On > equivalent hardware, that took about 4k seconds to load, but then queries > are like 5s each. > >