RE: SparkSQL + Tableau Connector

Andrew Lee Wed, 11 Feb 2015 12:22:31 -0800

Sorry folks, it is executing Spark jobs instead of Hive jobs. I mis-read the 
logs since there were other activities going on on the cluster.

From: alee...@hotmail.com
To: ar...@sigmoidanalytics.com; tsind...@gmail.com
CC: user@spark.apache.org
Subject: RE: SparkSQL + Tableau Connector
Date: Wed, 11 Feb 2015 11:56:44 -0800

I'm using mysql as the metastore DB with Spark 1.2.I simply copy the 
hive-site.xml to /etc/spark/ and added the mysql JDBC JAR to spark-env.sh in 
/etc/spark/, everything works fine now.
My setup looks like this.
Tableau => Spark ThriftServer2 => HiveServer2
It's talking to Tableau Desktop 8.3. Interestingly, when I query a Hive table, 
it still invokes Hive queries to HiveServer2 which is running MR or Tez engine. 
 Is this expected?  
I thought it should at least use the catalyst engine and talk to the underlying 
HDFS like what HiveContext API does to pull in the data into RDD.  Did I 
misunderstood the purpose of Spark ThriftServer2?

Date: Wed, 11 Feb 2015 16:07:40 +0530
Subject: Re: SparkSQL + Tableau Connector
From: ar...@sigmoidanalytics.com
To: tsind...@gmail.com
CC: user@spark.apache.org

Hi
I used this, though its using a embedded driver and is not a good approch.It 
works. You can configure for some other metastore type also. I have not tried 
the metastore uri's.

<configuration>

<property>

  <name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:derby:;databaseName=/opt/bigdata/spark-1.2.0/metastore_db;create=true</value>

  <description>URL for the DB</description>

</property>

<property>

  <name>javax.jdo.option.ConnectionDriverName</name>

  <value>org.apache.derby.jdbc.EmbeddedDriver</value>

</property>

<!-- <property>

  <name>hive.metastore.uris</name>

  <value>thrift://x.x.x.x:10000</value>

  <description>IP address (or fully-qualified domain name) and port of the 
metastore host</description>

</property> -->

</configuration>

On Wed, Feb 11, 2015 at 3:59 PM, Todd Nist <tsind...@gmail.com> wrote:
Hi Arush,
So yes I want to create the tables through Spark SQL.  I have placed the 
hive-site.xml file inside of the $SPARK_HOME/conf directory I thought that was 
all I should need to do to have the thriftserver use it.  Perhaps my 
hive-site.xml is worng, it currently looks like this:

<configuration><property>  <name>hive.metastore.uris</name>  <!-- Ensure that 
the following statement points to the Hive Metastore URI in your cluster -->  
<value>thrift://sandbox.hortonworks.com:9083</value>  <description>URI for 
client to contact metastore server</description></property></configuration>
Which leads me to believe it is going to pull form the thriftserver from 
Horton?  I will go look at the docs to see if this is right, it is what Horton 
says to do.  Do you have an example hive-site.xml by chance that works with 
Spark SQL?
I am using 8.3 of tableau with the SparkSQL Connector.
Thanks for the assistance.
-Todd
On Wed, Feb 11, 2015 at 2:34 AM, Arush Kharbanda <ar...@sigmoidanalytics.com> 
wrote:
BTW what tableau connector are you using?
On Wed, Feb 11, 2015 at 12:55 PM, Arush Kharbanda <ar...@sigmoidanalytics.com> 
wrote:
 I am a little confused here, why do you want to create the tables in hive. You 
want to create the tables in spark-sql, right?
If you are not able to find the same tables through tableau then thrift is 
connecting to a diffrent metastore than your spark-shell.
One way to specify a metstore to thrift is to provide the path to hive-site.xml 
while starting thrift using --files hive-site.xml.
similarly you can specify the same metastore to your spark-submit or 
sharp-shell using the same option.

On Wed, Feb 11, 2015 at 5:23 AM, Todd Nist <tsind...@gmail.com> wrote:
Arush,
As for #2 do you mean something like this from the docs:

// sc is an existing SparkContext.
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
sqlContext.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' 
INTO TABLE src")

// Queries are expressed in HiveQL
sqlContext.sql("FROM src SELECT key, value").collect().foreach(println)Or did 
you have something else in mind?
-Todd

On Tue, Feb 10, 2015 at 6:35 PM, Todd Nist <tsind...@gmail.com> wrote:
Arush,
Thank you will take a look at that approach in the morning.  I sort of figured 
the answer to #1 was NO and that I would need to do 2 and 3 thanks for 
clarifying it for me.
-Todd
On Tue, Feb 10, 2015 at 5:24 PM, Arush Kharbanda <ar...@sigmoidanalytics.com> 
wrote:
1.  Can the connector fetch or query schemaRDD's saved to Parquet or JSON 
files? NO

2.  Do I need to do something to expose these via hive / metastore other than 
creating a table in hive? Create a table in spark sql to expose via spark sql

3.  Does the thriftserver need to be configured to expose these in some 
fashion, sort of related to question 2 you would need to configure thrift to 
read from the metastore you expect it read from - by default it reads from 
metastore_db directory present in the directory used to launch the thrift 
server.

On 11 Feb 2015 01:35, "Todd Nist" <tsind...@gmail.com> wrote:
Hi,
I'm trying to understand how and what the Tableau connector to SparkSQL is able 
to access.  My understanding is it needs to connect to the thriftserver and I 
am not sure how or if it exposes parquet, json, schemaRDDs, or does it only 
expose schemas defined in the metastore / hive.  
For example, I do the following from the spark-shell which generates a 
schemaRDD from a csv file and saves it as a JSON file as well as a parquet file.
import org.apache.sql.SQLContext
import com.databricks.spark.csv._

val sqlContext = new SQLContext(sc)
val test = sqlContext.csfFile("/data/test.csv")
test.toJSON().saveAsTextFile("/data/out")
test.saveAsParquetFile("/data/out")

When I connect from Tableau, the only thing I see is the "default" schema and 
nothing in the tables section.
So my questions are:

1.  Can the connector fetch or query schemaRDD's saved to Parquet or JSON files?
2.  Do I need to do something to expose these via hive / metastore other than 
creating a table in hive?
3.  Does the thriftserver need to be configured to expose these in some 
fashion, sort of related to question 2.
TIA for the assistance.
-Todd

-- 
Arush Kharbanda || Technical teamleadar...@sigmoidanalytics.com || 
www.sigmoidanalytics.com

-- 
Arush Kharbanda || Technical teamleadar...@sigmoidanalytics.com || 
www.sigmoidanalytics.com

-- 
Arush Kharbanda || Technical teamleadar...@sigmoidanalytics.com || 
www.sigmoidanalytics.com

RE: SparkSQL + Tableau Connector

Reply via email to