600s for Spark vs 5s for Redshift...The numbers look much different from
the amplab benchmark...

https://amplab.cs.berkeley.edu/benchmark/

Is it like SSDs or something that's helping redshift or the whole data is
in memory when you run the query ? Could you publish the query ?

Also after spark-sql are we planning to add spark-sql runtimes in the
amplab benchmark as well ?



On Sun, Jun 22, 2014 at 9:13 AM, Toby Douglass <t...@avocet.io> wrote:

> I've just benchmarked Spark and Impala.  Same data (in s3), same query,
> same cluster.
>
> Impala has a long load time, since it cannot load directly from s3.  I
> have to create a Hive table on s3, then insert from that to an Impala
> table.  This takes a long time; Spark took about 600s for the query, Impala
> 250s, but Impala required 6k seconds to load data from s3.  If you're going
> to go the long-initial-load-then-quick-queries route, go for Redshift.  On
> equivalent hardware, that took about 4k seconds to load, but then queries
> are like 5s each.
>
>

Reply via email to