dataproblems opened a new issue, #11997:
URL: https://github.com/apache/hudi/issues/11997
**Describe the problem you faced**
Writes to a Hudi Table in S3 fail due to a FileNotFoundException on the
archived folder under the `.hoodie` directory. I've verified that the file does
exist in S3 and am puzzled as to what might be causing this issue. The
structured streaming query fails after the 17th batch.
**To Reproduce**
Steps to reproduce the behavior:
1. Create the necessary base table using a bulk insert
2. Start consuming from kinesis and upsert to the base table
3. Eventually after about 17 batches, this exception happens
**Expected behavior**
I'm expecting no issues like this to happen and the hudi table to be updated
with the new data. I tried removing the write to hudi and simply wrote the
batch output to S3, in that case my spark job continues to run well beyond
batch 17.
**Environment Description**
* Hudi version : 1.0.0-beta1
* Spark version : 3.3.2
* Hive version : 3.1.3
* Hadoop version : 3.3.3
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) : No
**Additional context**
We are not using any `async` services and this is a single writer updating
the hudi table.
Hudi Upsert Configuration:
```
val UpsertOptions: Map[String, String] = Map(
DataSourceWriteOptions.OPERATION.key() ->
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL,
DataSourceWriteOptions.TABLE_TYPE.key() ->
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL,
HoodieStorageConfig.PARQUET_COMPRESSION_CODEC_NAME.key() -> "snappy",
HoodieStorageConfig.PARQUET_MAX_FILE_SIZE
.key() -> "2147483648",
"hoodie.parquet.small.file.limit" -> "1073741824",
"hoodie.upsert.shuffle.parallelism" -> "5",
HoodieMetadataConfig.ENABLE_METADATA_INDEX_COLUMN_STATS.key() -> "true",
HoodieIndexConfig.INDEX_TYPE.key() -> "RECORD_INDEX",
"hoodie.metadata.enable" -> "true",
"hoodie.datasource.write.hive_style_partitioning" -> "true",
"hoodie.cleaner.policy" -> "KEEP_LATEST_COMMITS",
"hoodie.cleaner.commits.retained" -> "10",
"hoodie.metadata.record.index.enable" -> "true"
)
```
**Stacktrace**
```
Caused by: org.apache.hudi.exception.HoodieException: Failed to instantiate
Metadata table
at
org.apache.hudi.client.SparkRDDWriteClient.initializeMetadataTable(SparkRDDWriteClient.java:293)
at
org.apache.hudi.client.SparkRDDWriteClient.initMetadataTable(SparkRDDWriteClient.java:273)
at
org.apache.hudi.client.BaseHoodieWriteClient.doInitTable(BaseHoodieWriteClient.java:1250)
at
org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1290)
at
org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:139)
at
org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:224)
at
org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:506)
at
org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:196)
at
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:121)
at
org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:144)
at
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:104)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
at
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:114)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:139)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:139)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:245)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:138)
at
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:101)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97)
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:626)
at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:179)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:626)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:602)
at
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:97)
at
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:84)
at
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:82)
at
org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:125)
at
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:860)
at
**org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:390)**
Caused by: org.apache.hudi.exception.HoodieCommitException: Failed to write
commits
at
org.apache.hudi.client.timeline.LSMTimelineWriter.write(LSMTimelineWriter.java:120)
at
org.apache.hudi.client.timeline.HoodieTimelineArchiver.archiveIfRequired(HoodieTimelineArchiver.java:112)
at
org.apache.hudi.client.BaseHoodieTableServiceClient.archive(BaseHoodieTableServiceClient.java:788)
at
org.apache.hudi.client.BaseHoodieWriteClient.archive(BaseHoodieWriteClient.java:885)
at
org.apache.hudi.client.BaseHoodieWriteClient.archive(BaseHoodieWriteClient.java:895)
at
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.performTableServices(HoodieBackedTableMetadataWriter.java:1325)
at
org.apache.hudi.client.SparkRDDWriteClient.initializeMetadataTable(SparkRDDWriteClient.java:290)
... 77 more
Caused by: java.io.FileNotFoundException: No such file or directory
's3://some-bucket/some-prefix/table-name/.hoodie/metadata/.hoodie/archived/00000000000000010_00000000000000012_0.parquet'
at
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:529)
at
com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:617)
at
org.apache.hudi.common.fs.HoodieWrapperFileSystem.lambda$getFileStatus$17(HoodieWrapperFileSystem.java:410)
at
org.apache.hudi.common.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:114)
at
org.apache.hudi.common.fs.HoodieWrapperFileSystem.getFileStatus(HoodieWrapperFileSystem.java:404)
at
org.apache.hudi.client.timeline.LSMTimelineWriter.getFileEntry(LSMTimelineWriter.java:309)
at
org.apache.hudi.client.timeline.LSMTimelineWriter.updateManifest(LSMTimelineWriter.java:158)
at
org.apache.hudi.client.timeline.LSMTimelineWriter.updateManifest(LSMTimelineWriter.java:137)
at
org.apache.hudi.client.timeline.LSMTimelineWriter.write(LSMTimelineWriter.java:118)
... 83 more
24/09/23 22:33:51 INFO SparkContext: Invoking stop() from shutdown hook
24/09/23 22:33:51 INFO SparkUI: Stopped Spark web UI at
http://ip-10-0-171-12.ec2.internal:4040
24/09/23 22:33:51 INFO YarnClientSchedulerBackend: Interrupting monitor
thread
24/09/23 22:33:51 INFO YarnClientSchedulerBackend: Shutting down all
executors
24/09/23 22:33:51 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each
executor to shut down
24/09/23 22:33:51 INFO YarnClientSchedulerBackend: YARN client scheduler
backend Stopped
24/09/23 22:33:51 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
24/09/23 22:33:51 INFO MemoryStore: MemoryStore cleared
24/09/23 22:33:51 INFO BlockManager: BlockManager stopped
24/09/23 22:33:51 INFO BlockManagerMaster: BlockManagerMaster stopped
24/09/23 22:33:51 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
24/09/23 22:33:51 INFO SparkContext: Successfully stopped SparkContext
24/09/23 22:33:51 INFO ShutdownHookManager: Shutdown hook called
24/09/23 22:33:51 INFO ShutdownHookManager: Deleting directory
/mnt/tmp/spark-d3efe56e-91fa-4bf9-930b-78ac8c76ff79
24/09/23 22:33:51 INFO ShutdownHookManager: Deleting directory
/mnt/tmp/spark-adf36161-48cc-4f63-922d-d6f16b5d9be4
Command exiting with ret '1'
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]