Re: SparkSQL + Tableau Connector

Silvio Fiorito Tue, 10 Feb 2015 16:24:56 -0800

Todd,

I just tried it in bin/spark-sql shell. I created a folder json and just put 2 
copies of the same people.json file

This is what I ran:

spark-sql> create temporary table people
         > using org.apache.spark.sql.json
         > options (path 'examples/src/main/resources/json/*')
         > ;
Time taken: 0.34 seconds
spark-sql> select * from people;
NULL    Michael
30  Andy
19  Justin
NULL    Michael
30  Andy
19  Justin
Time taken: 0.576 seconds

From: Todd Nist
Date: Tuesday, February 10, 2015 at 6:49 PM
To: Silvio Fiorito
Cc: "[email protected]<mailto:[email protected]>"
Subject: Re: SparkSQL + Tableau Connector

Hi Silvio,

Ah, I like that, there is a section in Tableau for "Initial SQL" to be executed 
upon connecting this would fit well there.  I guess I will need to issue a 
collect(), coalesce(1,true).saveAsTextFile(...) or use repartition(1), as the 
file currently is being broken into multiple parts.   While this works in the 
spark-shell:

val test = sqlContext.jsonFile("/data/out/“)  // returs all parts back as one

It seems to fail in just spark-sql:

create temporary table test
using org.apache.spark.sql.json
options (path '/data/out/')
cache table test

with:

[Simba][SparkODBC] (35) Error from Spark: error code: '0' error message: 
'org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: create 
temporary table test using
org.apache.spark.sql.json
options (path '/data/out/')
cache table test'.

Initial SQL Error. Check that the syntax is correct and that you have access 
privileges to the requested database.

Thanks again for the suggestion and I will give work with it a bit more 
tomorrow.

-Todd

On Tue, Feb 10, 2015 at 5:48 PM, Silvio Fiorito 
<[email protected]<mailto:[email protected]>> wrote:
Hi Todd,

What you could do is run some SparkSQL commands immediately after the Thrift 
server starts up. Or does Tableau have some init SQL commands you could run?

You can actually load data using SQL, such as:

create temporary table people using org.apache.spark.sql.json options (path 
'examples/src/main/resources/people.json’)
cache table people

create temporary table users using org.apache.spark.sql.parquet options (path 
'examples/src/main/resources/users.parquet’)
cache table users

From: Todd Nist
Date: Tuesday, February 10, 2015 at 3:03 PM
To: "[email protected]<mailto:[email protected]>"
Subject: SparkSQL + Tableau Connector

Hi,

I'm trying to understand how and what the Tableau connector to SparkSQL is able 
to access.  My understanding is it needs to connect to the thriftserver and I 
am not sure how or if it exposes parquet, json, schemaRDDs, or does it only 
expose schemas defined in the metastore / hive.

For example, I do the following from the spark-shell which generates a 
schemaRDD from a csv file and saves it as a JSON file as well as a parquet file.

import org.apache.sql.SQLContext
import com.databricks.spark.csv._

val sqlContext = new SQLContext(sc)
val test = 
sqlContext.csfFile("/data/test.csv")test.toJSON().saveAsTextFile("/data/out")
test.saveAsParquetFile("/data/out")

When I connect from Tableau, the only thing I see is the "default" schema and 
nothing in the tables section.

So my questions are:

1.  Can the connector fetch or query schemaRDD's saved to Parquet or JSON files?
2.  Do I need to do something to expose these via hive / metastore other than 
creating a table in hive?
3.  Does the thriftserver need to be configured to expose these in some 
fashion, sort of related to question 2.

TIA for the assistance.

-Todd

Re: SparkSQL + Tableau Connector

Reply via email to