Re: Should I do spark-sql query on HDFS or apache hive?

Michael Armbrust Tue, 17 Mar 2015 17:40:19 -0700

I am trying to explain that these are not either/or decisions.  You are
likely going to be storing the data on HDFS no matter what other choices
you make.


You can use parquet to store the data whether or not you are addressing
files directly on HDFS or using the Hive Metastore to locate the underlying
files by table name.  Parquet is likely faster than the default format for
Hive tables, but with hive you can say "STORED AS PARQUET" too.

I suggest you look at the programming guide:
http://spark.apache.org/docs/latest/sql-programming-guide.html

Michael

On Tue, Mar 17, 2015 at 5:10 PM, 李铖 <lidali...@gmail.com> wrote:

> Did you mean that parquet is faster than hive format ,and hive format is
> faster than hdfs ,for Spark SQL?
>
> : ）
>
> 2015-03-18 1:23 GMT+08:00 Michael Armbrust <mich...@databricks.com>:
>
>> The performance has more to do with the particular format you are using,
>> not where the metadata is coming from.   Even hive tables are read from
>> files HDFS usually.
>>
>> You probably should use HiveContext as its query language is more
>> powerful than SQLContext.  Also, parquet is usually the faster data format
>> for Spark SQL.
>>
>> On Tue, Mar 17, 2015 at 3:41 AM, 李铖 <lidali...@gmail.com> wrote:
>>
>>> Hi,everybody.
>>>
>>> I am new in spark. Now I want to do interactive sql query using spark
>>> sql. spark sql can run under hive or loading files from hdfs.
>>>
>>> Which is better or faster?
>>>
>>> Thanks.
>>>
>>
>>
>

Re: Should I do spark-sql query on HDFS or apache hive?

Reply via email to