Re: Conversely, Hive is performing better than Spark-Sql

2015-11-24 Thread Sabarish Sasidharan
First of all, select * is not a useful SQL to evaluate. Very rarely would a user require all 362K records for visual analysis. Second, collect() forces movement of all data from executors to the driver. Instead write it out to some other table or to HDFS. Also Spark is more beneficial when you ha

Conversely, Hive is performing better than Spark-Sql

2015-11-24 Thread UMESH CHAUDHARY
Hi, I am using Hive 1.1.0 and Spark 1.5.1 and creating hive context in spark-shell. Now, I am experiencing reversed performance by Spark-Sql over Hive. By default Hive gives result back in 27 seconds for plain select * query on 1 GB dataset containing 3623203 records, while spark-sql gives back in