[GitHub] [incubator-seatunnel] tmljob commented on issue #1438: [Bug] [Spark-Sink-Hive] NoSuchDatabaseException: Database 'test' not found

GitBox Wed, 09 Mar 2022 20:36:11 -0800


tmljob commented on issue #1438:
URL: 
https://github.com/apache/incubator-seatunnel/issues/1438#issuecomment-1063656321



   I also encountered the same problem (based on 1.5.7), but I suspect that it 
is caused by the spark configuration problem, but I don't know where the 
configuration is, I hope to help clarify.
   The task configuration is as follows:
   ```
   spark {
     # seatunnel defined streaming batch duration in seconds
     spark.streaming.batchDuration = 5
   
     spark.app.name = "mysql_to_hive"
     spark.ui.port = 13000
     #spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation  = true
   }
   
   input {
     mysql {
       url = "jdbc:mysql://10.30.4.160:3306/cdh_cm"
       table = "metrics"
       result_table_name = "scr_metrics"
       user = "root"
       password = "root"
           }
   }
   
   filter {
   }
   
   output {
     Hive {
       source_table_name = "scr_metrics"
       result_table_name = "cdh_test.metrics"
       save_mode = "overwrite"
       sink_columns = 
"metric_id,optimistic_lock_version,metric_identifier,name,metric"
     }
   
   ```
   1. The same configuration is no problem when the nodes in the CDH cluster 
are running.
   2. On a node other than the CDH cluster, if spark is configured and 
hdfs-site.xml, core-site.xml, hive-site.xml, yarn-site.xml of the replication 
cluster are configured and the spark configuration path runs, a 
NoSuchDatabaseException will be reported. as follows;
   ```
   2022-03-10 10:57:39 INFO  Client:54 -
            client token: N/A
            diagnostics: User class threw exception: java.lang.Exception: 
org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 
'cdh_test' not found;
           at 
io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:62)
           at io.github.interestinglab.waterdrop.Waterdrop.main(Waterdrop.scala)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:678)
   Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: 
Database 'cdh_test' not found;
           at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalog$class.requireDbExists(ExternalCatalog.scala:42)
           at 
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireDbExists(InMemoryCatalog.scala:45)
           at 
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:331)
           at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:142)
           at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:415)
           at 
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:405)
           at 
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:400)
           at 
io.github.interestinglab.waterdrop.output.batch.Hive.process(Hive.scala:81)
           at 
io.github.interestinglab.waterdrop.Waterdrop$.outputProcess(Waterdrop.scala:278)
           at 
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:242)
           at 
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:241)
           at scala.collection.immutable.List.foreach(List.scala:392)
           at 
io.github.interestinglab.waterdrop.Waterdrop$.batchProcessing(Waterdrop.scala:241)
           at 
io.github.interestinglab.waterdrop.Waterdrop$.io$github$interestinglab$waterdrop$Waterdrop$$entrypoint(Waterdrop.scala:144)
           at 
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply$mcV$sp(Waterdrop.scala:57)
           at 
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:57)
           at 
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:57)
           at scala.util.Try$.apply(Try.scala:192)
           at 
io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:57)
           ... 6 more
   
            ApplicationMaster host: test-4-177
            ApplicationMaster RPC port: 3158
            queue: root.users.root
            start time: 1646881027981
            final status: FAILED
            tracking URL: 
http://test-4-178:8088/proxy/application_1645686071960_0055/
            user: root
   2022-03-10 10:57:39 ERROR Client:70 - Application diagnostics message: User 
class threw exception: java.lang.Exception: 
org.apache.spark.sql.catalyst.analysis.
   ```
   3. If the database name in the seatunnel task configuration is removed, the 
first run will not report an error, and a directory corresponding to the table 
name will be generated under the /user/hive/warehouse/ path of hdfs, but it 
cannot be found through show databases , and repeating the operation again will 
report an error, the error message is as follows:
   ```
   Exception in thread "main" java.lang.Exception: 
org.apache.spark.sql.AnalysisException: Can not create the managed 
table('`metrics`'). The associated location('/user/hive/warehouse/metrics') 
already exists.;
           at 
io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:62)
           at io.github.interestinglab.waterdrop.Waterdrop.main(Waterdrop.scala)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
           at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
           at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: org.apache.spark.sql.AnalysisException: Can not create the 
managed table('`metrics`'). The associated 
location('/user/hive/warehouse/metrics') already exists.;
           at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.validateTableLocation(SessionCatalog.scala:331)
           at 
org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:170)
           at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
           at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
           at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
           at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
           at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
           at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
           at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
           at 
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
           at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
           at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
           at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
           at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
           at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
           at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
           at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
           at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
           at 
org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:465)
           at 
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:444)
           at 
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:400)
           at 
io.github.interestinglab.waterdrop.output.batch.Hive.process(Hive.scala:81)
           at 
io.github.interestinglab.waterdrop.Waterdrop$.outputProcess(Waterdrop.scala:278)
           at 
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:242)
           at 
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:241)
           at scala.collection.immutable.List.foreach(List.scala:392)
           at 
io.github.interestinglab.waterdrop.Waterdrop$.batchProcessing(Waterdrop.scala:241)
           at 
io.github.interestinglab.waterdrop.Waterdrop$.io$github$interestinglab$waterdrop$Waterdrop$$entrypoint(Waterdrop.scala:144)
           at 
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply$mcV$sp(Waterdrop.scala:57)
           at 
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:57)
           at 
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:57)
           at scala.util.Try$.apply(Try.scala:192)
           at 
io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:57)
           ... 13 more
   2022-03-10 11:05:37 INFO  SparkUI:54 - Stopped Spark web UI at 
http://10.30.4.160:13000
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-seatunnel] tmljob commented on issue #1438: [Bug] [Spark-Sink-Hive] NoSuchDatabaseException: Database 'test' not found

Reply via email to