Hello,

I want to expose result of Spark computation to external tools. I plan to
do this with Thrift server JDBC interface by registering result Dataframe
as temp table.
I wrote a sample program in spark-shell to test this.

val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
> import hiveContext.implicits._
> HiveThriftServer2.startWithContext(hiveContext)
> val myDF =
> hiveContext.read.format("com.databricks.spark.csv").option("header",
> "true").load("/datafolder/weblog/pages.csv")
> myDF.registerTempTable("temp_table")


I'm able to see the temp table in Beeline

+-------------+--------------+
> |  tableName  | isTemporary  |
> +-------------+--------------+
> | temp_table  | true         |
> | my_table    | false        |
> +-------------+--------------+


Now when I issue "select * from temp_table" from Beeline, I see below
exception in spark-shell

15/07/13 17:18:27 WARN ThriftCLIService: Error executing statement:
org.apache.hive.service.cli.HiveSQLException:
*java.lang.ClassNotFoundException:
com.databricks.spark.csv.CsvRelation$$anonfun$buildScan$1$$anonfun$1*
        at
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:206)
        at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
        at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

I'm able to read the other table("my_table") from Beeline though.
Any suggestions on how to overcome this?

This is with Spark 1.4 pre-built version. Spark-shell was started with
--package to pass spark-csv.

Srikanth

Reply via email to