Thanks ! I have a very similar setup. I have built spark with -Phive which
includes hive-2.3.7 jars , spark-hive*jars and some hadoop-common* jars.

At runtime, i set SPARK_DIST_CLASSPATH=${hadoop classpath}

and set spark.sql.hive.metastore.version and spark.sql.hive.metastore.jars
to $HIVE_HOME/lib/*.

With this , I am able to read and write to hive successfully from my spark
jobs. So my question and doubt is the same as yours - is it just working by
chance ? How and when does spark use the hive-2.3.7* jars  as opposed to
the metastore jars ?

What if my hive tables uses some serdes and functions in my hive 3.x
cluster ? How will spark be able to use them at runtime ? Hope someone has
a clear understanding of how spark works with hive.

On Thu, Oct 22, 2020 at 12:48 PM Kimahriman <adam...@gmail.com> wrote:

> I have always been a little confused about the different hive-version
> integration as well. To expand on this question, we have a Hive 3.1.1
> metastore that we can successfully interact with using the -Phive profile
> with Hive 2.3.7. We do not use the Hive 3.1.1 jars anywhere in our Spark
> applications. Are we just lucky that the 2.3.7 jars are compatible for our
> use cases with the 3.1.1 metastore? Or are the
> `spark.sql.hive.metastore.jars` only used if you are using a direct JDBC
> connection and acting as the metastore?
>
> Also FWIW, the documentation only claims compatibility up to Hive version
> 3.1.2. Not sure if there's any breaking changes in 3.2 and beyond.
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to