Thanks ! I have a very similar setup. I have built spark with -Phive which includes hive-2.3.7 jars , spark-hive*jars and some hadoop-common* jars.
At runtime, i set SPARK_DIST_CLASSPATH=${hadoop classpath} and set spark.sql.hive.metastore.version and spark.sql.hive.metastore.jars to $HIVE_HOME/lib/*. With this , I am able to read and write to hive successfully from my spark jobs. So my question and doubt is the same as yours - is it just working by chance ? How and when does spark use the hive-2.3.7* jars as opposed to the metastore jars ? What if my hive tables uses some serdes and functions in my hive 3.x cluster ? How will spark be able to use them at runtime ? Hope someone has a clear understanding of how spark works with hive. On Thu, Oct 22, 2020 at 12:48 PM Kimahriman <adam...@gmail.com> wrote: > I have always been a little confused about the different hive-version > integration as well. To expand on this question, we have a Hive 3.1.1 > metastore that we can successfully interact with using the -Phive profile > with Hive 2.3.7. We do not use the Hive 3.1.1 jars anywhere in our Spark > applications. Are we just lucky that the 2.3.7 jars are compatible for our > use cases with the 3.1.1 metastore? Or are the > `spark.sql.hive.metastore.jars` only used if you are using a direct JDBC > connection and acting as the metastore? > > Also FWIW, the documentation only claims compatibility up to Hive version > 3.1.2. Not sure if there's any breaking changes in 3.2 and beyond. > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >