I am pretty new to Spark, and using Spark 1.3.1, I am trying to use 'Spark
SQL' to run some SQL scripts, on the cluster. I realized that for a better
performance, it is a good idea to use Parquet files. I have 2 questions
regarding that:

1) If I wanna use Spark SQL against  *partitioned & bucketed* tables with
Parquet format in Hive, does the provided spark binary on the apache
website support that or do I need to build a new spark binary with some
additional flags ? (I found a note
<https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables>
in
the documentation about enabling Hive support, but I could not fully get it
as what the correct way of building is, if I need to build)

2) Does running Spark SQL against tables in Hive downgrade the performance,
and it is better that I load parquet files directly to HDFS or having Hive
in the picture is harmless ?

Thnx

Reply via email to