I am pretty new to Spark, and using Spark 1.3.1, I am trying to use 'Spark SQL' to run some SQL scripts, on the cluster. I realized that for a better performance, it is a good idea to use Parquet files. I have 2 questions regarding that:
1) If I wanna use Spark SQL against *partitioned & bucketed* tables with Parquet format in Hive, does the provided spark binary on the apache website support that or do I need to build a new spark binary with some additional flags ? (I found a note <https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables> in the documentation about enabling Hive support, but I could not fully get it as what the correct way of building is, if I need to build) 2) Does running Spark SQL against tables in Hive downgrade the performance, and it is better that I load parquet files directly to HDFS or having Hive in the picture is harmless ? Thnx