Hi,

I know the JIRA of this error 
(https://issues.apache.org/jira/browse/SPARK-18112), and I read all the 
comments and even PR for it.

But I am facing this issue on AWS EMR, and only in Oozie Spark Action. I am 
looking for someone can give me a hint or direction,  so I can see if I can 
overcome this issue on EMR.

I am testing a simple Spark application on EMR-5.12.2, which comes with Hadoop 
2.8.3 + HCatalog 2.3.2 + Spark 2.2.1, and using AWS Glue Data Catalog for both 
Hive + Spark table metadata.

First of all, both Hive and Spark work fine with AWS Glue as metadata catalog. 
And my spark application works in spark-submit.

[hadoop@ip-172-31-65-232 oozieJobs]$ spark-shell
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.1
      /_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.
scala> spark.sql("show databases").show
+---------------+
|   databaseName|
+---------------+
|        default|
|googleanalytics|
|       sampledb|
+---------------+


I can access and query the database I created in Glue without any issue on 
spark-shell or spark-sql.
And as part of later problem, I can see when it works in this case, there is no 
set of "spark.sql.hive.metastore.version" in spark-shell, as the default value 
is shown below:

scala> spark.conf.get("spark.sql.hive.metastore.version")
res2: String = 1.2.1


Even though it shows version as "1.2.1", but I knew that by using Glue the hive 
metastore version will be "2.3.2", I can see "hive-metastore-2.3.2-amzn-1.jar" 
in the Hive library path.

Now here comes the issue, when I test the Spark code in the Oozie Spark action, 
and "enableHiveSupport" on the Spark session, it works with spark-submit in the 
command line, but failed with the following error in the oozie runtime:

ailing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], 
main() threw exception, HIVE_STATS_JDBC_TIMEOUT
java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT
        at 
org.apache.spark.sql.hive.HiveUtils$.hiveClientConfigurations(HiveUtils.scala:200)
        at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:265)
        at 
org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
        at 
org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
        at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195)
        at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
        at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)


I know this most likely caused by the Oozie runtime classpath, but I spent days 
of trying and still cannot find out a solution. We use Spark as our core of ETL 
engine, and the ability to manage and query the HiveCatalog is critical for us.

Here are what puzzled me:

  *   I know this issue was supposed fixing in Spark 2.2.0, and on this ERM, we 
are using Spark 2.2.1
  *   There is 1.2.1 version of hive metastore jar under the spark jars on EMR. 
Does this mean in the successful spark-shell runtime, spark indeed is using 
1.2.1 version of hive-metastore?

[hadoop@ip-172-31-65-232 oozieJobs]$ ls /usr/lib/spark/jars/*hive-meta*
/usr/lib/spark/jars/hive-metastore-1.2.1-spark2-amzn-0.jar

  *   There is 2.3.2 version of hive metastore jar under the Hive component on 
this EMR, which I believe it pointing to the Glue, right?

[hadoop@ip-172-31-65-232 oozieJobs]$ ls /usr/lib/hive/lib/*hive-meta*
/usr/lib/hive/lib/hive-metastore-2.3.2-amzn-1.jar  
/usr/lib/hive/lib/hive-metastore.jar

  *   I specified the "oozie.action.sharelib.for.spark=spark,hive" in the 
oozie, and I can see oozie runtime loads the jars from both spark and hive 
share libs. There is NO hive-metastore-1.2.1-spark2-amzn-0.jar in the oozie 
SPARK sharelib, and there is indeed hive-metastore-2.3.2-amzn-1.jar in the 
oozie HIVE sharelib.
  *   Based on my understanding of 
(https://issues.apache.org/jira/browse/SPARK-18112), here are what I did so far 
trying to fix this in oozie runtime, but none of them works
     *   I added hive-metastore-1.2.1-spark2-amzn-0.jar into hdfs of ozzie 
spark share lib, and run "oozie admin -sharelibupdate".  After that, I confirm 
this library loaded in the oozie runtime log of my spark action, but I got the 
same error message.
     *   I added "--conf spark.sql.hive.metastore.version=2.3.2" in the 
<spark-opts> of my oozie spark action, and confirm this configuration in spark 
session, but I still got the same error message above.
     *   I added "--conf spark.sql.hive.metastore.version=2.3.2 --conf 
spark.sql.hive.metastore.jars=maven", but still got the same error message
     *   I added "--conf spark.sql.hive.metastore.version=2.3.2 --conf 
spark.sql.hive.metastore.jars=/etc/spark/conf/hive-site.xml,/usr/lib/spark/jars/*"
 in oozie spark action, but got the same error message
     *   I added "--conf spark.sql.hive.metastore.version=2.3.2 --conf 
hive.metastore.uris=thrift://ip-172-31-65-232.ec2.internal:9083 --conf 
spark.sql.hive.metastore.jars=/etc/spark/conf/hive-site.xml,/usr/lib/spark/jars/*"
 in the oozie spark action, but got the same error.

I run out of options to try, and I really have no idea what is missing in the 
oozie runtime causing this error in the Spark.

Let me know if you have any idea.

Thanks

Yong

Reply via email to