Hi Gerard,

I’ve never had an issue using the HiveContext without a hive-site.xml 
configured. However, one issue you may have is if multiple users are starting 
the HiveContext from the same path, they’ll all be trying to store the default 
Derby metastore in the same location. Also, if you want them to be able to 
persist permanent table metadata for SparkSQL then you’ll want to set up a true 
metastore.

The other thing it could be is Hive dependency collisions from the classpath, 
but that shouldn’t be an issue since you said it’s standalone (not a Hadoop 
distro right?).

Thanks,
Silvio

From: Gerard Maas <gerard.m...@gmail.com>
Date: Thursday, May 26, 2016 at 5:28 AM
To: spark users <user@spark.apache.org>
Subject: HiveContext standalone => without a Hive metastore

Hi,

I'm helping some folks setting up an analytics cluster with  Spark.
They want to use the HiveContext to enable the Window functions on 
DataFrames(*) but they don't have any Hive installation, nor they need one at 
the moment (if not necessary for this feature)

When we try to create a Hive context, we get the following error:

> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sparkContext)
java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
       at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)

Is my HiveContext failing b/c it wants to connect to an unconfigured  Hive 
Metastore?

Is there  a way to instantiate a HiveContext for the sake of Window support 
without an underlying Hive deployment?

The docs are explicit in saying that that is should be the case: [1]

"To use a HiveContext, you do not need to have an existing Hive setup, and all 
of the data sources available to aSQLContext are still available. HiveContext 
is only packaged separately to avoid including all of Hive’s dependencies in 
the default Spark build."

So what is the right way to address this issue? How to instantiate a 
HiveContext with spark running on a HDFS cluster without Hive deployed?


Thanks a lot!

-Gerard.

(*) The need for a HiveContext to use Window functions is pretty obscure. The 
only documentation of this seems to be a runtime exception: 
"org.apache.spark.sql.AnalysisException: Could not resolve window function 
'max'. Note that, using window functions currently requires a HiveContext;"

[1] 
http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started

Reply via email to