I am trying to explain that these are not either/or decisions. You are likely going to be storing the data on HDFS no matter what other choices you make.
You can use parquet to store the data whether or not you are addressing files directly on HDFS or using the Hive Metastore to locate the underlying files by table name. Parquet is likely faster than the default format for Hive tables, but with hive you can say "STORED AS PARQUET" too. I suggest you look at the programming guide: http://spark.apache.org/docs/latest/sql-programming-guide.html Michael On Tue, Mar 17, 2015 at 5:10 PM, 李铖 <lidali...@gmail.com> wrote: > Did you mean that parquet is faster than hive format ,and hive format is > faster than hdfs ,for Spark SQL? > > : ) > > 2015-03-18 1:23 GMT+08:00 Michael Armbrust <mich...@databricks.com>: > >> The performance has more to do with the particular format you are using, >> not where the metadata is coming from. Even hive tables are read from >> files HDFS usually. >> >> You probably should use HiveContext as its query language is more >> powerful than SQLContext. Also, parquet is usually the faster data format >> for Spark SQL. >> >> On Tue, Mar 17, 2015 at 3:41 AM, 李铖 <lidali...@gmail.com> wrote: >> >>> Hi,everybody. >>> >>> I am new in spark. Now I want to do interactive sql query using spark >>> sql. spark sql can run under hive or loading files from hdfs. >>> >>> Which is better or faster? >>> >>> Thanks. >>> >> >> >