zuyanton opened a new issue #1764:
URL: https://github.com/apache/hudi/issues/1764


   
   **Describe the problem you faced**
   We are running MoR table on EMR+Hudi+S3 with 
```hoodie.consistency.check.enabled```set to true with compaction set to be 
executed inline.We update table every ten minutes with new data. We see 
following issue (actually two issues) . 
   First issue is that compaction fails from time to time with exception 
```HoodieCommitException: Failed to complete commit 20200624012710 due to 
finalize errors. caused by HoodieIOException: Consistency check failed to 
ensure all files APPEAR.``` Looks like Hudi tries to clean up duplicate data 
files created due to spark retries but consistency check fails because files 
are not there. This error does not appear when we disable consistency check by 
setting up hoodie.consistency.check.enabled to false, cause Hudi proceeds with 
attempt to delete non existing duplicate files and wraps up commit 
successfully, however since we use S3, having consistency check disabled is not 
ideal. First issue more often happens on bigger tables (>400gb) then small ones 
(<100gbs) 
   Second issue is that after First issue happens, Hudi never changes commit 
status and it stays INFLIGHT forever, which causes several other issues, like 
log files with the same fileID as parquet files that were part of failed 
compaction never get compacted, Hudi start ignoring cleaning settings and stops 
removing all the commits that happen after failed commit.  Although second 
issue in our case is caused by first issue, its still doen't seem right to 
leave compaction in INFLIGHT status after failure. 
   
   **To Reproduce**
   Create a MoR table with ~100 partitions saved to S3, run updates for a while 
with consistency check enabled and compaction set to be run inline. eventually 
one of the compaction jobs should fail and compaction commit should stay in 
INFLIGHT status
   
   
   **Environment Description**
   
   * Hudi version : 0.5.3
   
   * Spark version :2.4.4
   
   * Hive version :
   
   * Hadoop version :2.8.5
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Hudi settings that we use:
             "hoodie.consistency.check.enabled"->"true",
             "hoodie.compact.inline.max.delta.commits"->"12",
             "hoodie.compact.inline"->"true",
             "hoodie.clean.automatic"->"true",
             "hoodie.cleaner.commits.retained"->"2",
             DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY → "true",
   
   **Stacktrace**
   
   ```  
   20/06/24 01:38:05 INFO HoodieTable: Removing duplicate data files created 
due to spark retries before committing. 
Paths=[s3://bucketName/tableName/30/5bb5c4d5-a54a-4682-93d1-98ef3222d887-1_0-30-9408_20200624012710.parquet]
   20/06/24 01:42:22 ERROR ApplicationMaster: User class threw exception: 
org.apache.hudi.exception.HoodieCommitException: Failed to complete commit 
20200624012710 due to finalize errors.
   org.apache.hudi.exception.HoodieCommitException: Failed to complete commit 
20200624012710 due to finalize errors.
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.finalizeWrite(AbstractHoodieWriteClient.java:204)
        at 
org.apache.hudi.client.HoodieWriteClient.doCompactionCommit(HoodieWriteClient.java:1129)
        at 
org.apache.hudi.client.HoodieWriteClient.commitCompaction(HoodieWriteClient.java:1089)
        at 
org.apache.hudi.client.HoodieWriteClient.runCompaction(HoodieWriteClient.java:1072)
        at 
org.apache.hudi.client.HoodieWriteClient.compact(HoodieWriteClient.java:1043)
        at 
org.apache.hudi.client.HoodieWriteClient.lambda$forceCompact$12(HoodieWriteClient.java:1158)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
        at 
org.apache.hudi.client.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1155)
        at 
org.apache.hudi.client.HoodieWriteClient.postCommit(HoodieWriteClient.java:502)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:157)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:101)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:92)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.checkWriteStatus(HoodieSparkSqlWriter.scala:268)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:188)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
        at 
com.amazon.fdl.components.compaction.job.CompactionHudiJob2$.main(CompactionHudiJob2.scala:147)
        at 
com.amazon.fdl.components.compaction.job.CompactionHudiJob2.main(CompactionHudiJob2.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
   Caused by: org.apache.hudi.exception.HoodieIOException: Consistency check 
failed to ensure all files APPEAR
        at 
org.apache.hudi.table.HoodieTable.waitForAllFiles(HoodieTable.java:431)
        at 
org.apache.hudi.table.HoodieTable.cleanFailedWrites(HoodieTable.java:379)
        at org.apache.hudi.table.HoodieTable.finalizeWrite(HoodieTable.java:315)
        at 
org.apache.hudi.table.HoodieMergeOnReadTable.finalizeWrite(HoodieMergeOnReadTable.java:319)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.finalizeWrite(AbstractHoodieWriteClient.java:195)
        ... 42 more```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to