[ https://issues.apache.org/jira/browse/HIVE-26634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ayush Saxena resolved HIVE-26634. --------------------------------- Fix Version/s: Not Applicable Resolution: Cannot Reproduce This is a Spark thing or a AWS thing, resolving!!! > [Hive][Spark] EntityNotFoundException ,Database global_temp not found, when > connecting hive metastore to aws glue. > ------------------------------------------------------------------------------------------------------------------ > > Key: HIVE-26634 > URL: https://issues.apache.org/jira/browse/HIVE-26634 > Project: Hive > Issue Type: Bug > Reporter: Mahmood Abu Awwad > Priority: Blocker > Fix For: Not Applicable > > > while running our batches using Apache Spark with Hive on EMR cluster, as > we're using AWS glue as a MetaStore, it seems there is an issue occurs, which > is > {code:java} > EntityNotFoundException ,Database global_temp not found {code} > {code:java} > 2022-10-09T10:36:31,262 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - Completed compiling > command(queryId=hadoop_20221009103631_214e4b6c-b0f2-496e-b9a8-86831b202736); > Time taken: 0.02 seconds > 2022-10-09T10:36:31,262 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: reexec.ReExecDriver (:()) - Execution #1 of query > 2022-10-09T10:36:31,262 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - Concurrency mode is disabled, not creating a > lock manager > 2022-10-09T10:36:31,262 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - Executing > command(queryId=hadoop_20221009103631_214e4b6c-b0f2-496e-b9a8-86831b202736): > show views > 2022-10-09T10:36:31,263 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - Starting task [Stage-0:DDL] in serial mode > 2022-10-09T10:36:32,270 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - Completed executing > command(queryId=hadoop_20221009103631_214e4b6c-b0f2-496e-b9a8-86831b202736); > Time taken: 1.008 seconds > 2022-10-09T10:36:32,270 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - OK > 2022-10-09T10:36:32,270 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - Concurrency mode is disabled, not creating a > lock manager > 2022-10-09T10:36:32,271 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: exec.ListSinkOperator (:()) - RECORDS_OUT_INTERMEDIATE:0, > RECORDS_OUT_OPERATOR_LIST_SINK_0:0, > 2022-10-09T10:36:32,271 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: CliDriver (:()) - Time taken: 1.028 seconds > 2022-10-09T10:36:32,271 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: conf.HiveConf (HiveConf.java:getLogIdVar(5104)) - Using the > default value passed in for log id: 573c4ce0-f73c-439b-829d-1f0b25db45ec > 2022-10-09T10:36:32,272 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: session.SessionState (SessionState.java:resetThreadName(452)) - > Resetting thread name to main > 2022-10-09T10:36:46,512 INFO [main([])]: conf.HiveConf > (HiveConf.java:getLogIdVar(5104)) - Using the default value passed in for log > id: 573c4ce0-f73c-439b-829d-1f0b25db45ec > 2022-10-09T10:36:46,513 INFO [main([])]: session.SessionState > (SessionState.java:updateThreadName(441)) - Updating thread name to > 573c4ce0-f73c-439b-829d-1f0b25db45ec main > 2022-10-09T10:36:46,515 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - Compiling > command(queryId=hadoop_20221009103646_f390a868-07d7-49f1-b620-70d40e5e2cff): > use global_temp > 2022-10-09T10:36:46,530 INFO [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - Concurrency mode is disabled, not creating a > lock manager > 2022-10-09T10:36:46,666 ERROR [573c4ce0-f73c-439b-829d-1f0b25db45ec > main([])]: ql.Driver (:()) - FAILED: SemanticException [Error 10072]: > Database does not exist: global_temp > org.apache.hadoop.hive.ql.parse.SemanticException: Database does not exist: > global_temp > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getDatabase(BaseSemanticAnalyzer.java:2171) > at > org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeSwitchDatabase(DDLSemanticAnalyzer.java:1413) > at > org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:516) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826) > at > org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773) > at > org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:323) > at org.apache.hadoop.util.RunJar.main(RunJar.java:236) {code} > global_temp is a system preserved db by spark session to hold the global temp > views. > this db is not created on our AWS glue, as creating this on glue will fail > all our EMR jobs with this error > {code:java} > ERROR ApplicationMaster: User class threw exception: > org.apache.spark.SparkException: global_temp is a system preserved database, > please rename your existing database to resolve the name conflict, or set a > different value for spark.sql.globalTempDatabase, and launch your Spark > application again. {code} > We're not creating or using any global temp views in our project, but it > seems this is a health check happen when initializing spark session by spark > it self. > EMR configuration used > {code:java} > // [ > { > "Classification":"hive-site", > "Properties":{ > "hive.msck.path.validation":"ignore", > "hive.exec.max.dynamic.partitions":"1000000", > "hive.vectorized.execution.enabled":"true", > > "hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory", > "hive.exec.dynamic.partition.mode":"nonstrict", > "hive.exec.max.dynamic.partitions.pernode":"500000" > }, > "Configurations":[ > > ] > }, > { > "Classification":"yarn-site", > "Properties":{ > > "yarn.resourcemanager.scheduler.class":"org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler", > "yarn.log-aggregation.retain-seconds":"-1", > "yarn.scheduler.fair.allow-undeclared-pools":"true", > "yarn.log-aggregation-enable":"true", > "yarn.scheduler.fair.user-as-default-queue":"true", > "yarn.nodemanager.remote-app-log-dir":"LOGS_PATH", > "yarn.scheduler.fair.preemption":"true", > "yarn.scheduler.fair.preemption.cluster-utilization-threshold":"0.8", > "yarn.resourcemanager.am.max-attempts":"10" > }, > "Configurations":[ > > ] > }, > { > "Classification":"mapred-site", > "Properties":{ > > "mapred.jobtracker.taskScheduler":"org.apache.hadoop.mapred.FairScheduler" > }, > "Configurations":[ > > ] > }, > { > "Classification":"presto-connector-hive", > "Properties":{ > "hive.recursive-directories":"true", > "hive.metastore.glue.datacatalog.enabled":"true" > }, > "Configurations":[ > > ] > }, > { > "Classification":"spark-log4j", > "Properties":{ > "log4j.logger.com.project":"DEBUG", > "log4j.appender.rolling.layout":"org.apache.log4j.PatternLayout", > "log4j.logger.org.apache.spark":"WARN", > "log4j.appender.rolling.encoding":"UTF-8", > "log4j.appender.rolling.layout.ConversionPattern":"%d{yy/MM/dd > HH:mm:ss} %p %c{1}: %m%n", > "log4j.appender.rolling.maxBackupIndex":"5", > "log4j.appender.rolling":"org.apache.log4j.RollingFileAppender", > "log4j.rootLogger":"WARN, rolling", > "log4j.logger.org.eclipse.jetty":"WARN", > "log4j.appender.rolling.maxFileSize":"1000MB", > > "log4j.appender.rolling.file":"${spark.yarn.app.container.log.dir}/spark.log" > }, > "Configurations":[ > > ] > }, > { > "Classification":"emrfs-site", > "Properties":{ > "fs.s3.maxConnections":"10000" > }, > "Configurations":[ > > ] > }, > { > "Classification":"spark-hive-site", > "Properties":{ > > "hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory" > }, > "Configurations":[ > > ] > } > ] {code} > and the spark submit command is > {code:java} > spark-submit --deploy-mode cluster --master yarn --conf > spark.yarn.appMasterEnv.ENV=DEV --conf spark.executorEnv.ENV=DEV --conf > spark.network.timeout=6000s --conf spark.sql.catalogImplementation=hive > --conf spark.driver.memory=15g --conf > spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory > --class CLASS_NAME JAR_FILE_PATH > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)