tmljob commented on issue #1438:
URL:
https://github.com/apache/incubator-seatunnel/issues/1438#issuecomment-1063656321
I also encountered the same problem (based on 1.5.7), but I suspect that it
is caused by the spark configuration problem, but I don't know where the
configuration is, I hope to help clarify.
The task configuration is as follows:
```
spark {
# seatunnel defined streaming batch duration in seconds
spark.streaming.batchDuration = 5
spark.app.name = "mysql_to_hive"
spark.ui.port = 13000
#spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation = true
}
input {
mysql {
url = "jdbc:mysql://10.30.4.160:3306/cdh_cm"
table = "metrics"
result_table_name = "scr_metrics"
user = "root"
password = "root"
}
}
filter {
}
output {
Hive {
source_table_name = "scr_metrics"
result_table_name = "cdh_test.metrics"
save_mode = "overwrite"
sink_columns =
"metric_id,optimistic_lock_version,metric_identifier,name,metric"
}
```
1. The same configuration is no problem when the nodes in the CDH cluster
are running.
2. On a node other than the CDH cluster, if spark is configured and
hdfs-site.xml, core-site.xml, hive-site.xml, yarn-site.xml of the replication
cluster are configured and the spark configuration path runs, a
NoSuchDatabaseException will be reported. as follows;
```
2022-03-10 10:57:39 INFO Client:54 -
client token: N/A
diagnostics: User class threw exception: java.lang.Exception:
org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database
'cdh_test' not found;
at
io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:62)
at io.github.interestinglab.waterdrop.Waterdrop.main(Waterdrop.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:678)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException:
Database 'cdh_test' not found;
at
org.apache.spark.sql.catalyst.catalog.ExternalCatalog$class.requireDbExists(ExternalCatalog.scala:42)
at
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireDbExists(InMemoryCatalog.scala:45)
at
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:331)
at
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:142)
at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:415)
at
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:405)
at
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:400)
at
io.github.interestinglab.waterdrop.output.batch.Hive.process(Hive.scala:81)
at
io.github.interestinglab.waterdrop.Waterdrop$.outputProcess(Waterdrop.scala:278)
at
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:242)
at
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:241)
at scala.collection.immutable.List.foreach(List.scala:392)
at
io.github.interestinglab.waterdrop.Waterdrop$.batchProcessing(Waterdrop.scala:241)
at
io.github.interestinglab.waterdrop.Waterdrop$.io$github$interestinglab$waterdrop$Waterdrop$$entrypoint(Waterdrop.scala:144)
at
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply$mcV$sp(Waterdrop.scala:57)
at
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:57)
at
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:57)
at scala.util.Try$.apply(Try.scala:192)
at
io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:57)
... 6 more
ApplicationMaster host: test-4-177
ApplicationMaster RPC port: 3158
queue: root.users.root
start time: 1646881027981
final status: FAILED
tracking URL:
http://test-4-178:8088/proxy/application_1645686071960_0055/
user: root
2022-03-10 10:57:39 ERROR Client:70 - Application diagnostics message: User
class threw exception: java.lang.Exception:
org.apache.spark.sql.catalyst.analysis.
```
3. If the database name in the seatunnel task configuration is removed, the
first run will not report an error, and a directory corresponding to the table
name will be generated under the /user/hive/warehouse/ path of hdfs, but it
cannot be found through show databases , and repeating the operation again will
report an error, the error message is as follows:
```
Exception in thread "main" java.lang.Exception:
org.apache.spark.sql.AnalysisException: Can not create the managed
table('`metrics`'). The associated location('/user/hive/warehouse/metrics')
already exists.;
at
io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:62)
at io.github.interestinglab.waterdrop.Waterdrop.main(Waterdrop.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.sql.AnalysisException: Can not create the
managed table('`metrics`'). The associated
location('/user/hive/warehouse/metrics') already exists.;
at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.validateTableLocation(SessionCatalog.scala:331)
at
org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:170)
at
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
at
org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at
org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:465)
at
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:444)
at
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:400)
at
io.github.interestinglab.waterdrop.output.batch.Hive.process(Hive.scala:81)
at
io.github.interestinglab.waterdrop.Waterdrop$.outputProcess(Waterdrop.scala:278)
at
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:242)
at
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:241)
at scala.collection.immutable.List.foreach(List.scala:392)
at
io.github.interestinglab.waterdrop.Waterdrop$.batchProcessing(Waterdrop.scala:241)
at
io.github.interestinglab.waterdrop.Waterdrop$.io$github$interestinglab$waterdrop$Waterdrop$$entrypoint(Waterdrop.scala:144)
at
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply$mcV$sp(Waterdrop.scala:57)
at
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:57)
at
io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:57)
at scala.util.Try$.apply(Try.scala:192)
at
io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:57)
... 13 more
2022-03-10 11:05:37 INFO SparkUI:54 - Stopped Spark web UI at
http://10.30.4.160:13000
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]