Hi Michael,

Thank you for the explanation. Can you validate the following statement is
true/incomplete/false:
"hql uses Hive to parse and to construct the logical plan whereas sql is
pure spark implementation of parsing and logical plan construction. Once
spark obtains the logical plan, it is executed in spark regardless of
dialect although the execution might be different for the same query."

Best Regards,

Jerry


On Tue, Jul 15, 2014 at 6:22 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> hql and sql are just two different dialects for interacting with data.
>  After parsing is complete and the logical plan is constructed, the
> execution is exactly the same.
>
>
> On Tue, Jul 15, 2014 at 2:50 PM, Jerry Lam <chiling...@gmail.com> wrote:
>
>> Hi Michael,
>>
>> I don't understand the difference between hql (HiveContext) and sql
>> (SQLContext). My previous understanding was that hql is hive specific.
>> Unless the table is managed by Hive, we should use sql. For instance, RDD
>> (hdfsRDD) created from files in HDFS and registered as a table should use
>> sql.
>>
>> However, my current understanding after trying your suggestion above is
>> that I can also query the hdfsRDD using hql via LocalHiveContext. I just
>> tested it, the lateral view explode(schools) works with the hdfsRDD.
>>
>> It seems to me that the HiveContext and SQLContext is the same except
>> that HiveContext needs a metastore and it has a more powerful SQL support
>> borrowed from Hive. Can you shed some lights on this when you get a minute?
>>
>> Thanks,
>>
>> Jerry
>>
>>
>>
>>
>>
>> On Tue, Jul 15, 2014 at 4:32 PM, Michael Armbrust <mich...@databricks.com
>> > wrote:
>>
>>> No, that is why I included the link to SPARK-2096
>>> <https://issues.apache.org/jira/browse/SPARK-2096> as well.  You'll
>>> need to use HiveQL at this time.
>>>
>>> Is it possible or planed to support the "schools.time" format to filter
>>>>> the
>>>>> record that there is an element inside array of schools satisfy time >
>>>>> 2?
>>>>>
>>>>
>>> It would be great to support something like this, but its going to take
>>> a while to hammer out the correct semantics as SQL does not in general have
>>> great support for nested structures.  I think different people might
>>> interpret that query to mean there is SOME school.time >2 vs. ALL
>>> school.time > 2, etc.
>>>
>>> You can get what you want now using a lateral view:
>>>
>>> hql("SELECT DISTINCT name FROM people LATERAL VIEW explode(schools) s as
>>> school WHERE school.time > 2")
>>>
>>
>>
>

Reply via email to