Hi, I'm trying to get Spark 1.5.1 to work with Hive 0.13.1. I set the following properties in spark-defaults.conf:
spark.sql.hive.metastore.version 0.13.1 spark.sql.hive.metastore.jars /usr/lib/hadoop/client/*:/opt/hive/current/lib/* but I get the following exception when launching the shell: java.lang.ClassNotFoundException: java.lang.NoClassDefFoundError: com/google/common/base/Preconditions when creating Hive client using classpath: ... Please make sure that jars for your version of hive and hadoop are included in the paths passed to SQLConfEntry(key = spark.sql.hive.metastore.jars, defaultValue=builtin, doc= Location of the jars that should be used to instantiate the HiveMetastoreClient. This property can be one of three options: " 1. "builtin" Use Hive 1.2.1, which is bundled with the Spark assembly jar when <code>-Phive</code> is enabled. When this option is chosen, <code>spark.sql.hive.metastore.version</code> must be either <code>1.2.1</code> or not defined. 2. "maven" Use Hive jars of specified version downloaded from Maven repositories. 3. A classpath in the standard format for both Hive and Hadoop. , isPublic = true). at org.apache.spark.sql.hive.client.IsolatedClientLoader.liftedTree1$1(IsolatedClientLoader.scala:189) at org.apache.spark.sql.hive.client.IsolatedClientLoader.<init>(IsolatedClientLoader.scala:179) at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:263) at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:185) at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:392) at org.apache.spark.sql.SQLContext$$anonfun$5.apply(SQLContext.scala:235) at org.apache.spark.sql.SQLContext$$anonfun$5.apply(SQLContext.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:234) at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:72) I cut the classpath portion for brevity, but it does include guava-11.0.2.jar, so in theory that class should be available. It works if I add it to the driver's classpath: spark.driver.extraClassPath /usr/lib/hadoop/client/guava-11.0.2.jar This works in most of my use cases except when my own assembly jar has a dependency on a different version of guava and then my job fails with runtime exceptions because of incompatible guava classes. Sounds like I shouldn't set spark.driver.extraClassPath for this. Am I doing something wrong? Is this a bug in Spark? Could it be somehow related to the shading of guava that spark does? The following line seem suspicious because it basically says that for guava classes the regular spark class loader should be used but then it cannot find them: https://github.com/apache/spark/blob/v1.5.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L135 Thanks, - Sebastien