I haven't looked at it in detail... Somebody's been trying to do that in https://github.com/apache/spark/pull/20659, but that's kind of a huge change.
The parts where I'd be concerned are: - using Hive's original hive-exec package brings in a bunch of shaded dependencies, which may break Spark in weird ways. HIVE-16391 was supposed to fix that but nothing has really been done as part of that bug. - the hive-exec "core" package avoids the shaded dependencies but used to have issues of its own. Maybe it's better now, haven't looked. - what about the current thrift server which is basically a fork of the Hive 1.2 source code? - when using Hadoop 3 + an old metastore client that doesn't know about Hadoop 3, things may break. The latter one has two possible fixes: say that Hadoop 3 builds of Spark don't support old metastores; or add code so that Spark loads a separate copy of Hadoop libraries in that case (search for "sharesHadoopClasses" in IsolatedClientLoader for where to start with that). If trying to update Hive it would be good to avoid having to fork it, like it's done currently. But not sure that will be possible given the current hive-exec packaging. On Mon, Apr 2, 2018 at 2:58 PM, Reynold Xin <r...@databricks.com> wrote: > Is it difficult to upgrade Hive execution version to the latest version? The > metastore used to be an issue but now that part had been separated from the > execution part. > > > On Mon, Apr 2, 2018 at 1:57 PM, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> Saisai filed SPARK-23534, but the main blocking issue is really >> SPARK-18673. >> >> >> On Mon, Apr 2, 2018 at 1:00 PM, Reynold Xin <r...@databricks.com> wrote: >> > Does anybody know what needs to be done in order for Spark to support >> > Hadoop >> > 3? >> > >> >> >> >> -- >> Marcelo > > -- Marcelo --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org