Gents, I've been benchmarking Presto, Spark, Impala and Redshift.
I've been looking most recently at minimum query latency. In all cases, the cluster consists of eight m1.large EC2 instances. The miniimal data set is a single 3.5mb gzipped file. With Presto (backed by s3), I see 1 to 2 second latency. With Impala (backed by HDFS, as Impala does not support s3) I see about 1 second latency. With Spark, I see about 9 seconds latency. Thoughts?