Spark 1.5.1 with Hive 0.13.1

Sébastien Rainville Wed, 21 Oct 2015 08:19:19 -0700

Hi,

I'm trying to get Spark 1.5.1 to work with Hive 0.13.1. I set the following
properties in spark-defaults.conf:


spark.sql.hive.metastore.version 0.13.1
spark.sql.hive.metastore.jars
/usr/lib/hadoop/client/*:/opt/hive/current/lib/*


but I get the following exception when launching the shell:

java.lang.ClassNotFoundException: java.lang.NoClassDefFoundError:
com/google/common/base/Preconditions when creating Hive client using
classpath: ...

Please make sure that jars for your version of hive and hadoop are included
in the paths passed to SQLConfEntry(key = spark.sql.hive.metastore.jars,
defaultValue=builtin, doc=

 Location of the jars that should be used to instantiate the
HiveMetastoreClient.

 This property can be one of three options: "

 1. "builtin"

   Use Hive 1.2.1, which is bundled with the Spark assembly jar when

   <code>-Phive</code> is enabled. When this option is chosen,

   <code>spark.sql.hive.metastore.version</code> must be either

   <code>1.2.1</code> or not defined.

 2. "maven"

   Use Hive jars of specified version downloaded from Maven repositories.

 3. A classpath in the standard format for both Hive and Hadoop.

    , isPublic = true).

        at
org.apache.spark.sql.hive.client.IsolatedClientLoader.liftedTree1$1(IsolatedClientLoader.scala:189)

        at
org.apache.spark.sql.hive.client.IsolatedClientLoader.<init>(IsolatedClientLoader.scala:179)

        at
org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:263)

        at
org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:185)

        at
org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:392)

        at
org.apache.spark.sql.SQLContext$$anonfun$5.apply(SQLContext.scala:235)

        at
org.apache.spark.sql.SQLContext$$anonfun$5.apply(SQLContext.scala:234)

        at scala.collection.Iterator$class.foreach(Iterator.scala:727)

        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

        at
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)

        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)

        at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:234)

        at
org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:72)


I cut the classpath portion for brevity, but it does include
guava-11.0.2.jar, so in theory that class should be available. It works if
I add it to the driver's classpath:

spark.driver.extraClassPath /usr/lib/hadoop/client/guava-11.0.2.jar

This works in most of my use cases except when my own assembly jar has a
dependency on a different version of guava and then my job fails with
runtime exceptions because of incompatible guava classes.

Sounds like I shouldn't set spark.driver.extraClassPath for this. Am I
doing something wrong? Is this a bug in Spark? Could it be somehow related
to the shading of guava that spark does? The following line seem suspicious
because it basically says that for guava classes the regular spark class
loader should be used but then it cannot find them:

https://github.com/apache/spark/blob/v1.5.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L135

Thanks,

- Sebastien

Spark 1.5.1 with Hive 0.13.1

Reply via email to