Hi, We write the output of models and other information as parquet files and later we let data APIs run SQL queries on the columnar data...
SparkSQL is used to dump the data in parquet format and now we are considering whether using SparkSQL or Impala to read it back... I came across this benchmark and I was not sure if this has been validated by Spark community / databricks... http://blog.cloudera.com/blog/2014/09/new-benchmarks-for-sql-on-hadoop-impala-1-4-widens-the-performance-gap/ Any inputs will be helpful... Thanks. Deb