Hi, I know the JIRA of this error (https://issues.apache.org/jira/browse/SPARK-18112), and I read all the comments and even PR for it.
But I am facing this issue on AWS EMR, and only in Oozie Spark Action. I am looking for someone can give me a hint or direction, so I can see if I can overcome this issue on EMR. I am testing a simple Spark application on EMR-5.12.2, which comes with Hadoop 2.8.3 + HCatalog 2.3.2 + Spark 2.2.1, and using AWS Glue Data Catalog for both Hive + Spark table metadata. First of all, both Hive and Spark work fine with AWS Glue as metadata catalog. And my spark application works in spark-submit. [hadoop@ip-172-31-65-232 oozieJobs]$ spark-shell Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.1 /_/ Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_171) Type in expressions to have them evaluated. Type :help for more information. scala> spark.sql("show databases").show +---------------+ | databaseName| +---------------+ | default| |googleanalytics| | sampledb| +---------------+ I can access and query the database I created in Glue without any issue on spark-shell or spark-sql. And as part of later problem, I can see when it works in this case, there is no set of "spark.sql.hive.metastore.version" in spark-shell, as the default value is shown below: scala> spark.conf.get("spark.sql.hive.metastore.version") res2: String = 1.2.1 Even though it shows version as "1.2.1", but I knew that by using Glue the hive metastore version will be "2.3.2", I can see "hive-metastore-2.3.2-amzn-1.jar" in the Hive library path. Now here comes the issue, when I test the Spark code in the Oozie Spark action, and "enableHiveSupport" on the Spark session, it works with spark-submit in the command line, but failed with the following error in the oozie runtime: ailing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, HIVE_STATS_JDBC_TIMEOUT java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT at org.apache.spark.sql.hive.HiveUtils$.hiveClientConfigurations(HiveUtils.scala:200) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:265) at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66) at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195) I know this most likely caused by the Oozie runtime classpath, but I spent days of trying and still cannot find out a solution. We use Spark as our core of ETL engine, and the ability to manage and query the HiveCatalog is critical for us. Here are what puzzled me: * I know this issue was supposed fixing in Spark 2.2.0, and on this ERM, we are using Spark 2.2.1 * There is 1.2.1 version of hive metastore jar under the spark jars on EMR. Does this mean in the successful spark-shell runtime, spark indeed is using 1.2.1 version of hive-metastore? [hadoop@ip-172-31-65-232 oozieJobs]$ ls /usr/lib/spark/jars/*hive-meta* /usr/lib/spark/jars/hive-metastore-1.2.1-spark2-amzn-0.jar * There is 2.3.2 version of hive metastore jar under the Hive component on this EMR, which I believe it pointing to the Glue, right? [hadoop@ip-172-31-65-232 oozieJobs]$ ls /usr/lib/hive/lib/*hive-meta* /usr/lib/hive/lib/hive-metastore-2.3.2-amzn-1.jar /usr/lib/hive/lib/hive-metastore.jar * I specified the "oozie.action.sharelib.for.spark=spark,hive" in the oozie, and I can see oozie runtime loads the jars from both spark and hive share libs. There is NO hive-metastore-1.2.1-spark2-amzn-0.jar in the oozie SPARK sharelib, and there is indeed hive-metastore-2.3.2-amzn-1.jar in the oozie HIVE sharelib. * Based on my understanding of (https://issues.apache.org/jira/browse/SPARK-18112), here are what I did so far trying to fix this in oozie runtime, but none of them works * I added hive-metastore-1.2.1-spark2-amzn-0.jar into hdfs of ozzie spark share lib, and run "oozie admin -sharelibupdate". After that, I confirm this library loaded in the oozie runtime log of my spark action, but I got the same error message. * I added "--conf spark.sql.hive.metastore.version=2.3.2" in the <spark-opts> of my oozie spark action, and confirm this configuration in spark session, but I still got the same error message above. * I added "--conf spark.sql.hive.metastore.version=2.3.2 --conf spark.sql.hive.metastore.jars=maven", but still got the same error message * I added "--conf spark.sql.hive.metastore.version=2.3.2 --conf spark.sql.hive.metastore.jars=/etc/spark/conf/hive-site.xml,/usr/lib/spark/jars/*" in oozie spark action, but got the same error message * I added "--conf spark.sql.hive.metastore.version=2.3.2 --conf hive.metastore.uris=thrift://ip-172-31-65-232.ec2.internal:9083 --conf spark.sql.hive.metastore.jars=/etc/spark/conf/hive-site.xml,/usr/lib/spark/jars/*" in the oozie spark action, but got the same error. I run out of options to try, and I really have no idea what is missing in the oozie runtime causing this error in the Spark. Let me know if you have any idea. Thanks Yong