Hi,

We write the output of models and other information as parquet files and
later we let data APIs run SQL queries on the columnar data...

SparkSQL is used to dump the data in parquet format and now we are
considering whether using SparkSQL or Impala to read it back...

I came across this benchmark and I was not sure if this has been validated
by Spark community / databricks...

http://blog.cloudera.com/blog/2014/09/new-benchmarks-for-sql-on-hadoop-impala-1-4-widens-the-performance-gap/

Any inputs will be helpful...

Thanks.
Deb

Reply via email to