[jira] [Created] (HIVE-26634) [Hive][Spark] EntityNotFoundException ,Database global_temp not found, when connecting hive metastore to aws glue.

Mahmood Abu Awwad (Jira) Sun, 16 Oct 2022 03:09:07 -0700

Mahmood Abu Awwad created HIVE-26634:
----------------------------------------


             Summary: [Hive][Spark] EntityNotFoundException ,Database 
global_temp not found, when connecting hive metastore to aws glue.
                 Key: HIVE-26634
                 URL: https://issues.apache.org/jira/browse/HIVE-26634
             Project: Hive
          Issue Type: Bug
            Reporter: Mahmood Abu Awwad


while running our batches using Apache Spark with Hive on EMR cluster, as we're 
using AWS glue as a MetaStore, it seems there is an issue occurs, which is 
{code:java}
EntityNotFoundException ,Database global_temp not found {code}
{code:java}
2022-10-09T10:36:31,262 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: 
ql.Driver (:()) - Completed compiling 
command(queryId=hadoop_20221009103631_214e4b6c-b0f2-496e-b9a8-86831b202736); 
Time taken: 0.02 seconds
2022-10-09T10:36:31,262 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: 
reexec.ReExecDriver (:()) - Execution #1 of query
2022-10-09T10:36:31,262 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: 
ql.Driver (:()) - Concurrency mode is disabled, not creating a lock manager
2022-10-09T10:36:31,262 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: 
ql.Driver (:()) - Executing 
command(queryId=hadoop_20221009103631_214e4b6c-b0f2-496e-b9a8-86831b202736): 
show views
2022-10-09T10:36:31,263 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: 
ql.Driver (:()) - Starting task [Stage-0:DDL] in serial mode
2022-10-09T10:36:32,270 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: 
ql.Driver (:()) - Completed executing 
command(queryId=hadoop_20221009103631_214e4b6c-b0f2-496e-b9a8-86831b202736); 
Time taken: 1.008 seconds
2022-10-09T10:36:32,270 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: 
ql.Driver (:()) - OK
2022-10-09T10:36:32,270 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: 
ql.Driver (:()) - Concurrency mode is disabled, not creating a lock manager
2022-10-09T10:36:32,271 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: 
exec.ListSinkOperator (:()) - RECORDS_OUT_INTERMEDIATE:0, 
RECORDS_OUT_OPERATOR_LIST_SINK_0:0,
2022-10-09T10:36:32,271 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: 
CliDriver (:()) - Time taken: 1.028 seconds
2022-10-09T10:36:32,271 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: 
conf.HiveConf (HiveConf.java:getLogIdVar(5104)) - Using the default value 
passed in for log id: 573c4ce0-f73c-439b-829d-1f0b25db45ec
2022-10-09T10:36:32,272 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: 
session.SessionState (SessionState.java:resetThreadName(452)) - Resetting 
thread name to  main
2022-10-09T10:36:46,512 INFO  [main([])]: conf.HiveConf 
(HiveConf.java:getLogIdVar(5104)) - Using the default value passed in for log 
id: 573c4ce0-f73c-439b-829d-1f0b25db45ec
2022-10-09T10:36:46,513 INFO  [main([])]: session.SessionState 
(SessionState.java:updateThreadName(441)) - Updating thread name to 
573c4ce0-f73c-439b-829d-1f0b25db45ec main
2022-10-09T10:36:46,515 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: 
ql.Driver (:()) - Compiling 
command(queryId=hadoop_20221009103646_f390a868-07d7-49f1-b620-70d40e5e2cff): 
use global_temp
2022-10-09T10:36:46,530 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: 
ql.Driver (:()) - Concurrency mode is disabled, not creating a lock manager
2022-10-09T10:36:46,666 ERROR [573c4ce0-f73c-439b-829d-1f0b25db45ec main([])]: 
ql.Driver (:()) - FAILED: SemanticException [Error 10072]: Database does not 
exist: global_temp
org.apache.hadoop.hive.ql.parse.SemanticException: Database does not exist: 
global_temp
        at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getDatabase(BaseSemanticAnalyzer.java:2171)
        at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeSwitchDatabase(DDLSemanticAnalyzer.java:1413)
        at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:516)
        at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659)
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826)
        at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773)
        at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768)
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214)
        at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
        at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236) {code}

global_temp is a system preserved db by spark session to hold the global temp 
views.
this db is not created on our AWS glue, as creating this on glue will fail all 
our EMR jobs with this error
{code:java}
ERROR ApplicationMaster: User class threw exception: 
org.apache.spark.SparkException: global_temp is a system preserved database, 
please rename your existing database to resolve the name conflict, or set a 
different value for spark.sql.globalTempDatabase, and launch your Spark 
application again. {code}
We're not creating or using any global temp views in our project, but it seems 
this is a health check happen when initializing spark session by spark it self.

EMR configuration used 
{code:java}
// [
   {
      "Classification":"hive-site",
      "Properties":{
         "hive.msck.path.validation":"ignore",
         "hive.exec.max.dynamic.partitions":"1000000",
         "hive.vectorized.execution.enabled":"true",
         
"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
         "hive.exec.dynamic.partition.mode":"nonstrict",
         "hive.exec.max.dynamic.partitions.pernode":"500000"
      },
      "Configurations":[
         
      ]
   },
   {
      "Classification":"yarn-site",
      "Properties":{
         
"yarn.resourcemanager.scheduler.class":"org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler",
         "yarn.log-aggregation.retain-seconds":"-1",
         "yarn.scheduler.fair.allow-undeclared-pools":"true",
         "yarn.log-aggregation-enable":"true",
         "yarn.scheduler.fair.user-as-default-queue":"true",
         "yarn.nodemanager.remote-app-log-dir":"LOGS_PATH",
         "yarn.scheduler.fair.preemption":"true",
         "yarn.scheduler.fair.preemption.cluster-utilization-threshold":"0.8",
         "yarn.resourcemanager.am.max-attempts":"10"
      },
      "Configurations":[
         
      ]
   },
   {
      "Classification":"mapred-site",
      "Properties":{
         
"mapred.jobtracker.taskScheduler":"org.apache.hadoop.mapred.FairScheduler"
      },
      "Configurations":[
         
      ]
   },
   {
      "Classification":"presto-connector-hive",
      "Properties":{
         "hive.recursive-directories":"true",
         "hive.metastore.glue.datacatalog.enabled":"true"
      },
      "Configurations":[
         
      ]
   },
   {
      "Classification":"spark-log4j",
      "Properties":{
         "log4j.logger.com.project":"DEBUG",
         "log4j.appender.rolling.layout":"org.apache.log4j.PatternLayout",
         "log4j.logger.org.apache.spark":"WARN",
         "log4j.appender.rolling.encoding":"UTF-8",
         "log4j.appender.rolling.layout.ConversionPattern":"%d{yy/MM/dd 
HH:mm:ss} %p %c{1}: %m%n",
         "log4j.appender.rolling.maxBackupIndex":"5",
         "log4j.appender.rolling":"org.apache.log4j.RollingFileAppender",
         "log4j.rootLogger":"WARN, rolling",
         "log4j.logger.org.eclipse.jetty":"WARN",
         "log4j.appender.rolling.maxFileSize":"1000MB",
         
"log4j.appender.rolling.file":"${spark.yarn.app.container.log.dir}/spark.log"
      },
      "Configurations":[
         
      ]
   },
   {
      "Classification":"emrfs-site",
      "Properties":{
         "fs.s3.maxConnections":"10000"
      },
      "Configurations":[
         
      ]
   },
   {
      "Classification":"spark-hive-site",
      "Properties":{
         
"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
      },
      "Configurations":[
         
      ]
   }
] {code}

and the spark submit command is
{code:java}
 spark-submit --deploy-mode cluster --master yarn --conf 
spark.yarn.appMasterEnv.ENV=DEV --conf spark.executorEnv.ENV=DEV  --conf 
spark.network.timeout=6000s --conf spark.sql.catalogImplementation=hive --conf 
spark.driver.memory=15g --conf 
spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory
 --class CLASS_NAME JAR_FILE_PATH
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-26634) [Hive][Spark] EntityNotFoundException ,Database global_temp not found, when connecting hive metastore to aws glue.

Reply via email to