Re: Few Conceptual Questions on Spark-SQL and HiveQL

Narayanan K Fri, 11 Sep 2015 11:00:58 -0700

Hi there ?
Any replied :)

-Narayanan


On Fri, Sep 11, 2015 at 1:51 AM, Narayanan K <knarayana...@gmail.com> wrote:
> Hi all,
>
> We are migrating from Hive to Spark. We used Spark-SQL CLI to run our
> Hive Queries for performance testing. I am new to Spark and had few
> clarifications. We have :
>
>
> 1. Set up 10 boxes, one master and 9 slaves in standalone mode. Each
> of the boxes are launchers to our external Hadoop grid.
> 2. Copied hive-site.xml to spark conf. The hive metastore uri is
> external to our spark cluster.
> 3. Use spark-sql CLI to submit direct hive queries from the master
> host. Our Hive Queries hit hive tables on the remote hdfs cluster
> which are in ORC format.
>
>
> Questions :
>
> 1. What are the sequence of steps involved from the time a HQL is
> submitted to execution of query in spark cluster ?
> 2. Was an RDD created to read OrcFile from remote hdfs ? Did it get
> the storage information from the hive metastore ?
> 3. Since hdfs cluster is remote from spark cluster, how is data
> locality obtained here ?
> 4. Does running queries in Spark-SQL CLI and access remote hive
> metastore incur any cost in query performance ?
> 5. In Spark SQL programming guide , it is mentioned , Spark-SQL CLI is
> only for local mode. What does this mean ? We were able to submit 100s
> of queries using the CLI. Is there any downside to this approach ?
> 6. Is it possible to create one hivecontext, add all udf jar once and
> submit 100 queries with the same hive context ?
>
> Thanks
> Narayanan

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Few Conceptual Questions on Spark-SQL and HiveQL

Reply via email to