[jira] [Resolved] (HIVE-26634) [Hive][Spark] EntityNotFoundException ,Database global_temp not found, when connecting hive metastore to aws glue.

Ayush Saxena (Jira) Thu, 21 Dec 2023 11:44:04 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-26634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ayush Saxena resolved HIVE-26634.
---------------------------------
    Fix Version/s: Not Applicable
       Resolution: Cannot Reproduce

This is a Spark thing or a AWS thing, resolving!!!

> [Hive][Spark] EntityNotFoundException ,Database global_temp not found, when 
> connecting hive metastore to aws glue.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-26634
>                 URL: https://issues.apache.org/jira/browse/HIVE-26634
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Mahmood Abu Awwad
>            Priority: Blocker
>             Fix For: Not Applicable
>
>
> while running our batches using Apache Spark with Hive on EMR cluster, as 
> we're using AWS glue as a MetaStore, it seems there is an issue occurs, which 
> is 
> {code:java}
> EntityNotFoundException ,Database global_temp not found {code}
> {code:java}
> 2022-10-09T10:36:31,262 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - Completed compiling 
> command(queryId=hadoop_20221009103631_214e4b6c-b0f2-496e-b9a8-86831b202736); 
> Time taken: 0.02 seconds
> 2022-10-09T10:36:31,262 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: reexec.ReExecDriver (:()) - Execution #1 of query
> 2022-10-09T10:36:31,262 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - Concurrency mode is disabled, not creating a 
> lock manager
> 2022-10-09T10:36:31,262 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - Executing 
> command(queryId=hadoop_20221009103631_214e4b6c-b0f2-496e-b9a8-86831b202736): 
> show views
> 2022-10-09T10:36:31,263 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - Starting task [Stage-0:DDL] in serial mode
> 2022-10-09T10:36:32,270 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - Completed executing 
> command(queryId=hadoop_20221009103631_214e4b6c-b0f2-496e-b9a8-86831b202736); 
> Time taken: 1.008 seconds
> 2022-10-09T10:36:32,270 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - OK
> 2022-10-09T10:36:32,270 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - Concurrency mode is disabled, not creating a 
> lock manager
> 2022-10-09T10:36:32,271 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: exec.ListSinkOperator (:()) - RECORDS_OUT_INTERMEDIATE:0, 
> RECORDS_OUT_OPERATOR_LIST_SINK_0:0,
> 2022-10-09T10:36:32,271 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: CliDriver (:()) - Time taken: 1.028 seconds
> 2022-10-09T10:36:32,271 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: conf.HiveConf (HiveConf.java:getLogIdVar(5104)) - Using the 
> default value passed in for log id: 573c4ce0-f73c-439b-829d-1f0b25db45ec
> 2022-10-09T10:36:32,272 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: session.SessionState (SessionState.java:resetThreadName(452)) - 
> Resetting thread name to  main
> 2022-10-09T10:36:46,512 INFO  [main([])]: conf.HiveConf 
> (HiveConf.java:getLogIdVar(5104)) - Using the default value passed in for log 
> id: 573c4ce0-f73c-439b-829d-1f0b25db45ec
> 2022-10-09T10:36:46,513 INFO  [main([])]: session.SessionState 
> (SessionState.java:updateThreadName(441)) - Updating thread name to 
> 573c4ce0-f73c-439b-829d-1f0b25db45ec main
> 2022-10-09T10:36:46,515 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - Compiling 
> command(queryId=hadoop_20221009103646_f390a868-07d7-49f1-b620-70d40e5e2cff): 
> use global_temp
> 2022-10-09T10:36:46,530 INFO  [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - Concurrency mode is disabled, not creating a 
> lock manager
> 2022-10-09T10:36:46,666 ERROR [573c4ce0-f73c-439b-829d-1f0b25db45ec 
> main([])]: ql.Driver (:()) - FAILED: SemanticException [Error 10072]: 
> Database does not exist: global_temp
> org.apache.hadoop.hive.ql.parse.SemanticException: Database does not exist: 
> global_temp
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getDatabase(BaseSemanticAnalyzer.java:2171)
>         at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeSwitchDatabase(DDLSemanticAnalyzer.java:1413)
>         at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:516)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659)
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826)
>         at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773)
>         at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768)
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
>         at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
>         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:236) {code}
> global_temp is a system preserved db by spark session to hold the global temp 
> views.
> this db is not created on our AWS glue, as creating this on glue will fail 
> all our EMR jobs with this error
> {code:java}
> ERROR ApplicationMaster: User class threw exception: 
> org.apache.spark.SparkException: global_temp is a system preserved database, 
> please rename your existing database to resolve the name conflict, or set a 
> different value for spark.sql.globalTempDatabase, and launch your Spark 
> application again. {code}
> We're not creating or using any global temp views in our project, but it 
> seems this is a health check happen when initializing spark session by spark 
> it self.
> EMR configuration used 
> {code:java}
> // [
>    {
>       "Classification":"hive-site",
>       "Properties":{
>          "hive.msck.path.validation":"ignore",
>          "hive.exec.max.dynamic.partitions":"1000000",
>          "hive.vectorized.execution.enabled":"true",
>          
> "hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
>          "hive.exec.dynamic.partition.mode":"nonstrict",
>          "hive.exec.max.dynamic.partitions.pernode":"500000"
>       },
>       "Configurations":[
>          
>       ]
>    },
>    {
>       "Classification":"yarn-site",
>       "Properties":{
>          
> "yarn.resourcemanager.scheduler.class":"org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler",
>          "yarn.log-aggregation.retain-seconds":"-1",
>          "yarn.scheduler.fair.allow-undeclared-pools":"true",
>          "yarn.log-aggregation-enable":"true",
>          "yarn.scheduler.fair.user-as-default-queue":"true",
>          "yarn.nodemanager.remote-app-log-dir":"LOGS_PATH",
>          "yarn.scheduler.fair.preemption":"true",
>          "yarn.scheduler.fair.preemption.cluster-utilization-threshold":"0.8",
>          "yarn.resourcemanager.am.max-attempts":"10"
>       },
>       "Configurations":[
>          
>       ]
>    },
>    {
>       "Classification":"mapred-site",
>       "Properties":{
>          
> "mapred.jobtracker.taskScheduler":"org.apache.hadoop.mapred.FairScheduler"
>       },
>       "Configurations":[
>          
>       ]
>    },
>    {
>       "Classification":"presto-connector-hive",
>       "Properties":{
>          "hive.recursive-directories":"true",
>          "hive.metastore.glue.datacatalog.enabled":"true"
>       },
>       "Configurations":[
>          
>       ]
>    },
>    {
>       "Classification":"spark-log4j",
>       "Properties":{
>          "log4j.logger.com.project":"DEBUG",
>          "log4j.appender.rolling.layout":"org.apache.log4j.PatternLayout",
>          "log4j.logger.org.apache.spark":"WARN",
>          "log4j.appender.rolling.encoding":"UTF-8",
>          "log4j.appender.rolling.layout.ConversionPattern":"%d{yy/MM/dd 
> HH:mm:ss} %p %c{1}: %m%n",
>          "log4j.appender.rolling.maxBackupIndex":"5",
>          "log4j.appender.rolling":"org.apache.log4j.RollingFileAppender",
>          "log4j.rootLogger":"WARN, rolling",
>          "log4j.logger.org.eclipse.jetty":"WARN",
>          "log4j.appender.rolling.maxFileSize":"1000MB",
>          
> "log4j.appender.rolling.file":"${spark.yarn.app.container.log.dir}/spark.log"
>       },
>       "Configurations":[
>          
>       ]
>    },
>    {
>       "Classification":"emrfs-site",
>       "Properties":{
>          "fs.s3.maxConnections":"10000"
>       },
>       "Configurations":[
>          
>       ]
>    },
>    {
>       "Classification":"spark-hive-site",
>       "Properties":{
>          
> "hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
>       },
>       "Configurations":[
>          
>       ]
>    }
> ] {code}
> and the spark submit command is
> {code:java}
>  spark-submit --deploy-mode cluster --master yarn --conf 
> spark.yarn.appMasterEnv.ENV=DEV --conf spark.executorEnv.ENV=DEV  --conf 
> spark.network.timeout=6000s --conf spark.sql.catalogImplementation=hive 
> --conf spark.driver.memory=15g --conf 
> spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory
>  --class CLASS_NAME JAR_FILE_PATH
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-26634) [Hive][Spark] EntityNotFoundException ,Database global_temp not found, when connecting hive metastore to aws glue.

Reply via email to