Sorry folks, it is executing Spark jobs instead of Hive jobs. I mis-read the logs since there were other activities going on on the cluster.
From: alee...@hotmail.com To: ar...@sigmoidanalytics.com; tsind...@gmail.com CC: user@spark.apache.org Subject: RE: SparkSQL + Tableau Connector Date: Wed, 11 Feb 2015 11:56:44 -0800 I'm using mysql as the metastore DB with Spark 1.2.I simply copy the hive-site.xml to /etc/spark/ and added the mysql JDBC JAR to spark-env.sh in /etc/spark/, everything works fine now. My setup looks like this. Tableau => Spark ThriftServer2 => HiveServer2 It's talking to Tableau Desktop 8.3. Interestingly, when I query a Hive table, it still invokes Hive queries to HiveServer2 which is running MR or Tez engine. Is this expected? I thought it should at least use the catalyst engine and talk to the underlying HDFS like what HiveContext API does to pull in the data into RDD. Did I misunderstood the purpose of Spark ThriftServer2? Date: Wed, 11 Feb 2015 16:07:40 +0530 Subject: Re: SparkSQL + Tableau Connector From: ar...@sigmoidanalytics.com To: tsind...@gmail.com CC: user@spark.apache.org Hi I used this, though its using a embedded driver and is not a good approch.It works. You can configure for some other metastore type also. I have not tried the metastore uri's. <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:derby:;databaseName=/opt/bigdata/spark-1.2.0/metastore_db;create=true</value> <description>URL for the DB</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>org.apache.derby.jdbc.EmbeddedDriver</value> </property> <!-- <property> <name>hive.metastore.uris</name> <value>thrift://x.x.x.x:10000</value> <description>IP address (or fully-qualified domain name) and port of the metastore host</description> </property> --> </configuration> On Wed, Feb 11, 2015 at 3:59 PM, Todd Nist <tsind...@gmail.com> wrote: Hi Arush, So yes I want to create the tables through Spark SQL. I have placed the hive-site.xml file inside of the $SPARK_HOME/conf directory I thought that was all I should need to do to have the thriftserver use it. Perhaps my hive-site.xml is worng, it currently looks like this: <configuration><property> <name>hive.metastore.uris</name> <!-- Ensure that the following statement points to the Hive Metastore URI in your cluster --> <value>thrift://sandbox.hortonworks.com:9083</value> <description>URI for client to contact metastore server</description></property></configuration> Which leads me to believe it is going to pull form the thriftserver from Horton? I will go look at the docs to see if this is right, it is what Horton says to do. Do you have an example hive-site.xml by chance that works with Spark SQL? I am using 8.3 of tableau with the SparkSQL Connector. Thanks for the assistance. -Todd On Wed, Feb 11, 2015 at 2:34 AM, Arush Kharbanda <ar...@sigmoidanalytics.com> wrote: BTW what tableau connector are you using? On Wed, Feb 11, 2015 at 12:55 PM, Arush Kharbanda <ar...@sigmoidanalytics.com> wrote: I am a little confused here, why do you want to create the tables in hive. You want to create the tables in spark-sql, right? If you are not able to find the same tables through tableau then thrift is connecting to a diffrent metastore than your spark-shell. One way to specify a metstore to thrift is to provide the path to hive-site.xml while starting thrift using --files hive-site.xml. similarly you can specify the same metastore to your spark-submit or sharp-shell using the same option. On Wed, Feb 11, 2015 at 5:23 AM, Todd Nist <tsind...@gmail.com> wrote: Arush, As for #2 do you mean something like this from the docs: // sc is an existing SparkContext. val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") sqlContext.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src") // Queries are expressed in HiveQL sqlContext.sql("FROM src SELECT key, value").collect().foreach(println)Or did you have something else in mind? -Todd On Tue, Feb 10, 2015 at 6:35 PM, Todd Nist <tsind...@gmail.com> wrote: Arush, Thank you will take a look at that approach in the morning. I sort of figured the answer to #1 was NO and that I would need to do 2 and 3 thanks for clarifying it for me. -Todd On Tue, Feb 10, 2015 at 5:24 PM, Arush Kharbanda <ar...@sigmoidanalytics.com> wrote: 1. Can the connector fetch or query schemaRDD's saved to Parquet or JSON files? NO 2. Do I need to do something to expose these via hive / metastore other than creating a table in hive? Create a table in spark sql to expose via spark sql 3. Does the thriftserver need to be configured to expose these in some fashion, sort of related to question 2 you would need to configure thrift to read from the metastore you expect it read from - by default it reads from metastore_db directory present in the directory used to launch the thrift server. On 11 Feb 2015 01:35, "Todd Nist" <tsind...@gmail.com> wrote: Hi, I'm trying to understand how and what the Tableau connector to SparkSQL is able to access. My understanding is it needs to connect to the thriftserver and I am not sure how or if it exposes parquet, json, schemaRDDs, or does it only expose schemas defined in the metastore / hive. For example, I do the following from the spark-shell which generates a schemaRDD from a csv file and saves it as a JSON file as well as a parquet file. import org.apache.sql.SQLContext import com.databricks.spark.csv._ val sqlContext = new SQLContext(sc) val test = sqlContext.csfFile("/data/test.csv") test.toJSON().saveAsTextFile("/data/out") test.saveAsParquetFile("/data/out") When I connect from Tableau, the only thing I see is the "default" schema and nothing in the tables section. So my questions are: 1. Can the connector fetch or query schemaRDD's saved to Parquet or JSON files? 2. Do I need to do something to expose these via hive / metastore other than creating a table in hive? 3. Does the thriftserver need to be configured to expose these in some fashion, sort of related to question 2. TIA for the assistance. -Todd -- Arush Kharbanda || Technical teamleadar...@sigmoidanalytics.com || www.sigmoidanalytics.com -- Arush Kharbanda || Technical teamleadar...@sigmoidanalytics.com || www.sigmoidanalytics.com -- Arush Kharbanda || Technical teamleadar...@sigmoidanalytics.com || www.sigmoidanalytics.com