Todd,
I just tried it in bin/spark-sql shell. I created a folder json and just put 2
copies of the same people.json file
This is what I ran:
spark-sql> create temporary table people
> using org.apache.spark.sql.json
> options (path 'examples/src/main/resources/json/*')
> ;
Time taken: 0.34 seconds
spark-sql> select * from people;
NULL Michael
30 Andy
19 Justin
NULL Michael
30 Andy
19 Justin
Time taken: 0.576 seconds
From: Todd Nist
Date: Tuesday, February 10, 2015 at 6:49 PM
To: Silvio Fiorito
Cc: "[email protected]<mailto:[email protected]>"
Subject: Re: SparkSQL + Tableau Connector
Hi Silvio,
Ah, I like that, there is a section in Tableau for "Initial SQL" to be executed
upon connecting this would fit well there. I guess I will need to issue a
collect(), coalesce(1,true).saveAsTextFile(...) or use repartition(1), as the
file currently is being broken into multiple parts. While this works in the
spark-shell:
val test = sqlContext.jsonFile("/data/out/“) // returs all parts back as one
It seems to fail in just spark-sql:
create temporary table test
using org.apache.spark.sql.json
options (path '/data/out/')
cache table test
with:
[Simba][SparkODBC] (35) Error from Spark: error code: '0' error message:
'org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: create
temporary table test using
org.apache.spark.sql.json
options (path '/data/out/')
cache table test'.
Initial SQL Error. Check that the syntax is correct and that you have access
privileges to the requested database.
Thanks again for the suggestion and I will give work with it a bit more
tomorrow.
-Todd
On Tue, Feb 10, 2015 at 5:48 PM, Silvio Fiorito
<[email protected]<mailto:[email protected]>> wrote:
Hi Todd,
What you could do is run some SparkSQL commands immediately after the Thrift
server starts up. Or does Tableau have some init SQL commands you could run?
You can actually load data using SQL, such as:
create temporary table people using org.apache.spark.sql.json options (path
'examples/src/main/resources/people.json’)
cache table people
create temporary table users using org.apache.spark.sql.parquet options (path
'examples/src/main/resources/users.parquet’)
cache table users
From: Todd Nist
Date: Tuesday, February 10, 2015 at 3:03 PM
To: "[email protected]<mailto:[email protected]>"
Subject: SparkSQL + Tableau Connector
Hi,
I'm trying to understand how and what the Tableau connector to SparkSQL is able
to access. My understanding is it needs to connect to the thriftserver and I
am not sure how or if it exposes parquet, json, schemaRDDs, or does it only
expose schemas defined in the metastore / hive.
For example, I do the following from the spark-shell which generates a
schemaRDD from a csv file and saves it as a JSON file as well as a parquet file.
import org.apache.sql.SQLContext
import com.databricks.spark.csv._
val sqlContext = new SQLContext(sc)
val test =
sqlContext.csfFile("/data/test.csv")test.toJSON().saveAsTextFile("/data/out")
test.saveAsParquetFile("/data/out")
When I connect from Tableau, the only thing I see is the "default" schema and
nothing in the tables section.
So my questions are:
1. Can the connector fetch or query schemaRDD's saved to Parquet or JSON files?
2. Do I need to do something to expose these via hive / metastore other than
creating a table in hive?
3. Does the thriftserver need to be configured to expose these in some
fashion, sort of related to question 2.
TIA for the assistance.
-Todd