Hi Silvio,
Ah, I like that, there is a section in Tableau for "Initial SQL" to be
executed upon connecting this would fit well there. I guess I will need to
issue a collect(), coalesce(1,true).saveAsTextFile(...) or use
repartition(1), as the file currently is being broken into multiple parts.
While this works in the spark-shell:
val test = sqlContext.jsonFile("/data/out/“) // returs all parts back as
one
It seems to fail in just spark-sql:
create temporary table test
using org.apache.spark.sql.json
options (path '/data/out/')
cache table test
with:
[Simba][SparkODBC] (35) Error from Spark: error code: '0' error message:
'org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: create
temporary table test using
org.apache.spark.sql.json
options (path '/data/out/')
cache table test'.
Initial SQL Error. Check that the syntax is correct and that you have
access privileges to the requested database.
Thanks again for the suggestion and I will give work with it a bit more
tomorrow.
-Todd
On Tue, Feb 10, 2015 at 5:48 PM, Silvio Fiorito <
[email protected]> wrote:
> Hi Todd,
>
> What you could do is run some SparkSQL commands immediately after the
> Thrift server starts up. Or does Tableau have some init SQL commands you
> could run?
>
>
> You can actually load data using SQL, such as:
>
> create temporary table people using org.apache.spark.sql.json options
> (path 'examples/src/main/resources/people.json’)
> cache table people
>
> create temporary table users using org.apache.spark.sql.parquet options
> (path 'examples/src/main/resources/users.parquet’)
> cache table users
>
> From: Todd Nist
> Date: Tuesday, February 10, 2015 at 3:03 PM
> To: "[email protected]"
> Subject: SparkSQL + Tableau Connector
>
> Hi,
>
> I'm trying to understand how and what the Tableau connector to SparkSQL
> is able to access. My understanding is it needs to connect to the
> thriftserver and I am not sure how or if it exposes parquet, json,
> schemaRDDs, or does it only expose schemas defined in the metastore / hive.
>
>
> For example, I do the following from the spark-shell which generates a
> schemaRDD from a csv file and saves it as a JSON file as well as a parquet
> file.
>
> import *org.apache.sql.SQLContext
> *import com.databricks.spark.csv._
> val sqlContext = new SQLContext(sc)
> val test =
> sqlContext.csfFile("/data/test.csv")test.toJSON().saveAsTextFile("/data/out")
> test.saveAsParquetFile("/data/out")
>
> When I connect from Tableau, the only thing I see is the "default"
> schema and nothing in the tables section.
>
> So my questions are:
>
> 1. Can the connector fetch or query schemaRDD's saved to Parquet or JSON
> files?
> 2. Do I need to do something to expose these via hive / metastore other
> than creating a table in hive?
> 3. Does the thriftserver need to be configured to expose these in some
> fashion, sort of related to question 2.
>
> TIA for the assistance.
>
> -Todd
>