Hi there ? Any replied :) -Narayanan
On Fri, Sep 11, 2015 at 1:51 AM, Narayanan K <knarayana...@gmail.com> wrote: > Hi all, > > We are migrating from Hive to Spark. We used Spark-SQL CLI to run our > Hive Queries for performance testing. I am new to Spark and had few > clarifications. We have : > > > 1. Set up 10 boxes, one master and 9 slaves in standalone mode. Each > of the boxes are launchers to our external Hadoop grid. > 2. Copied hive-site.xml to spark conf. The hive metastore uri is > external to our spark cluster. > 3. Use spark-sql CLI to submit direct hive queries from the master > host. Our Hive Queries hit hive tables on the remote hdfs cluster > which are in ORC format. > > > Questions : > > 1. What are the sequence of steps involved from the time a HQL is > submitted to execution of query in spark cluster ? > 2. Was an RDD created to read OrcFile from remote hdfs ? Did it get > the storage information from the hive metastore ? > 3. Since hdfs cluster is remote from spark cluster, how is data > locality obtained here ? > 4. Does running queries in Spark-SQL CLI and access remote hive > metastore incur any cost in query performance ? > 5. In Spark SQL programming guide , it is mentioned , Spark-SQL CLI is > only for local mode. What does this mean ? We were able to submit 100s > of queries using the CLI. Is there any downside to this approach ? > 6. Is it possible to create one hivecontext, add all udf jar once and > submit 100 queries with the same hive context ? > > Thanks > Narayanan --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org