ravs11 opened a new issue #4609:
URL: https://github.com/apache/hudi/issues/4609


   **Describe the problem you faced**
   
   I am running a spark batch job that reads parquet files from hdfs and then 
does bulk insert and clustering with z-order index. Output data is also written 
to hdfs as parquet format. Sometimes this job fails with 
HoodieClusteringException. Please let me know if I need to provide any other 
details.
   
   **Hudi Configuration**
   
   .option("hoodie.table.name", "target_table")
         .option("hoodie.datasource.write.table.name", "target_table")
         .option("hoodie.datasource.write.operation", "bulk_insert")
         .option("hoodie.sql.insert.mode", "non-strict")
         .option("hoodie.datasource.write.precombine.field", "log_timestamp")
         .option("hoodie.datasource.write.recordkey.field", 
"deviceid,session_id,event_id")
         .option("hoodie.datasource.write.partitionpath.field", 
"grass_region,utc_date,hour,datehour")
         .option("hoodie.datasource.write.keygenerator.class", 
"org.apache.hudi.keygen.ComplexKeyGenerator")
         .option("hoodie.datasource.write.hive_style_partitioning", "true")
         .option("hoodie.bulkinsert.shuffle.parallelism", "2000")
         .option("hoodie.bulkinsert.sort.mode", "NONE")
         .option("hoodie.parquet.compression.codec", "zstd")
         .option("hoodie.clustering.inline", "true")
         .option("hoodie.clustering.inline.max.commits", "1")
         .option("hoodie.clustering.plan.strategy.target.file.max.bytes", 
"1073741824")
         .option("hoodie.clustering.plan.strategy.small.file.limit", 
"536870912")
         .option("hoodie.clustering.plan.strategy.sort.columns", 
"page_type,page_section_0,target_type")
         .option("hoodie.layout.optimize.enable", "true")
         .option("hoodie.layout.optimize.strategy", "z-order")
   
   **Environment Description**
   
   * Hudi version : 0.10.0
   
   * Spark version : 3.1.2
   
   * Hadoop version : 3.2
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   **Stacktrace**
   
   ```22/01/15 18:40:16 ERROR AppendDataExec: Data source write support 
org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite@1bdaf906 is 
aborting.
   22/01/15 18:40:16 ERROR DataSourceInternalWriterHelper: Commit 
20220115181353796 aborted 
   22/01/15 18:40:17 ERROR AppendDataExec: Data source write support 
org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite@1bdaf906 
failed to abort.
   22/01/15 18:40:17 ERROR ApplicationMaster: User class threw exception: 
org.apache.spark.SparkException: Writing job failed.
   org.apache.spark.SparkException: Writing job failed.
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:383)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:336)
        at 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec.writeWithV2(WriteToDataSourceV2Exec.scala:218)
        at 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec.run(WriteToDataSourceV2Exec.scala:225)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.doExecute(V2CommandExec.scala:55)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
        at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:991)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:991)
        at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:370)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:480)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:162)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
        at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:991)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:991)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
        at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
        at 
com.abc.de.common_traffic.dwd.pipeline.CIVHudiJobRunner.createPipeline(CIVHudiJobRunner.scala:63)
        at 
com.abc.de.common.spark.runtime.SparkJobRunner.run(SparkJobRunner.scala:68)
        at com.abc.de.common_traffic.dwd.CIVHudiMain$.main(CIVHudiMain.scala:17)
        at com.abc.de.common_traffic.dwd.CIVHudiMain.main(CIVHudiMain.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:743)
   Caused by: org.apache.hudi.exception.HoodieException: unable to transition 
clustering inflight to complete: 20220115181514083
        at 
org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:86)
        at 
org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite.commit(HoodieDataSourceInternalBatchWrite.java:93)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:371)
        ... 55 more
        Suppressed: org.apache.hudi.exception.HoodieRollbackException: Failed 
to rollback hdfs://R2/projects/data_trafficmart/hdfs/dev/hudi_z_order//view_di/ 
commits 20220115181353796
                at 
org.apache.hudi.client.AbstractHoodieWriteClient.rollback(AbstractHoodieWriteClient.java:655)
                at 
org.apache.hudi.client.AbstractHoodieWriteClient.rollback(AbstractHoodieWriteClient.java:597)
                at 
org.apache.hudi.internal.DataSourceInternalWriterHelper.abort(DataSourceInternalWriterHelper.java:94)
                at 
org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite.abort(HoodieDataSourceInternalBatchWrite.java:98)
                at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:378)
                ... 55 more
        Caused by: java.lang.IllegalArgumentException: Cannot use marker based 
rollback strategy on completed instant:[20220115181353796__commit__COMPLETED]
                at 
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
                at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.<init>(BaseRollbackActionExecutor.java:90)
                at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.<init>(BaseRollbackActionExecutor.java:71)
                at 
org.apache.hudi.table.action.rollback.CopyOnWriteRollbackActionExecutor.<init>(CopyOnWriteRollbackActionExecutor.java:48)
                at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.rollback(HoodieSparkCopyOnWriteTable.java:343)
                at 
org.apache.hudi.client.AbstractHoodieWriteClient.rollback(AbstractHoodieWriteClient.java:640)
                ... 59 more
   Caused by: org.apache.hudi.exception.HoodieClusteringException: unable to 
transition clustering inflight to complete: 20220115181514083
        at 
org.apache.hudi.client.SparkRDDWriteClient.completeClustering(SparkRDDWriteClient.java:395)
        at 
org.apache.hudi.client.SparkRDDWriteClient.completeTableService(SparkRDDWriteClient.java:470)
        at 
org.apache.hudi.client.SparkRDDWriteClient.cluster(SparkRDDWriteClient.java:364)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.lambda$inlineCluster$15(AbstractHoodieWriteClient.java:1103)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.inlineCluster(AbstractHoodieWriteClient.java:1101)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.runTableServicesInline(AbstractHoodieWriteClient.java:478)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:206)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:176)
        at 
org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:83)
        ... 57 more
   Caused by: org.apache.spark.sql.AnalysisException: Unable to infer schema 
for Parquet. It must be specified manually.
        at 
org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$12(DataSource.scala:211)
        at scala.Option.getOrElse(Option.scala:189)
        at 
org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:211)
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:419)
        at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
        at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:239)
        at 
org.apache.hudi.index.zorder.ZOrderingIndexHelper.updateZIndexFor(ZOrderingIndexHelper.java:420)
        at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.updateZIndex(HoodieSparkCopyOnWriteTable.java:218)
        at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.updateMetadataIndexes(HoodieSparkCopyOnWriteTable.java:176)
        at 
org.apache.hudi.client.SparkRDDWriteClient.completeClustering(SparkRDDWriteClient.java:388)
        ... 66 more
   22/01/15 18:40:17 INFO ApplicationMaster: Final app status: FAILED, 
exitCode: 15, (reason: User class threw exception: 
org.apache.spark.SparkException: Writing job failed.
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:383)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:336)
        at 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec.writeWithV2(WriteToDataSourceV2Exec.scala:218)
        at 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec.run(WriteToDataSourceV2Exec.scala:225)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.doExecute(V2CommandExec.scala:55)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
        at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:991)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:991)
        at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:370)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:480)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:162)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
        at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:991)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:991)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
        at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
        at 
com.abc.de.common_traffic.dwd.pipeline.CIVHudiJobRunner.createPipeline(CIVHudiJobRunner.scala:63)
        at 
com.abc.de.common.spark.runtime.SparkJobRunner.run(SparkJobRunner.scala:68)
        at com.abc.de.common_traffic.dwd.CIVHudiMain$.main(CIVHudiMain.scala:17)
        at com.abc.de.common_traffic.dwd.CIVHudiMain.main(CIVHudiMain.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:743)
   Caused by: org.apache.hudi.exception.HoodieException: unable to transition 
clustering inflight to complete: 20220115181514083
        at 
org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:86)
        at 
org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite.commit(HoodieDataSourceInternalBatchWrite.java:93)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:371)
        ... 55 more
        Suppressed: org.apache.hudi.exception.HoodieRollbackException: Failed 
to rollback hdfs://R2/projects/data_trafficmart/hdfs/dev/hudi_z_order//view_di/ 
commits 20220115181353796
                at 
org.apache.hudi.client.AbstractHoodieWriteClient.rollback(AbstractHoodieWriteClient.java:655)
                at 
org.apache.hudi.client.AbstractHoodieWriteClient.rollback(AbstractHoodieWriteClient.java:597)
                at 
org.apache.hudi.internal.DataSourceInternalWriterHelper.abort(DataSourceInternalWriterHelper.java:94)
                at 
org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite.abort(HoodieDataSourceInternalBatchWrite.java:98)
                at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:378)
                ... 55 more
        Caused by: java.lang.IllegalArgumentException: Cannot use marker based 
rollback strategy on completed instant:[20220115181353796__commit__COMPLETED]
                at 
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
                at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.<init>(BaseRollbackActionExecutor.java:90)
                at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.<init>(BaseRollbackActionExecutor.java:71)
                at 
org.apache.hudi.table.action.rollback.CopyOnWriteRollbackActionExecutor.<init>(CopyOnWriteRollbackActionExecutor.java:48)
                at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.rollback(HoodieSparkCopyOnWriteTable.java:343)
                at 
org.apache.hudi.client.AbstractHoodieWriteClient.rollback(AbstractHoodieWriteClient.java:640)
                ... 59 more
   Caused by: org.apache.hudi.exception.HoodieClusteringException: unable to 
transition clustering inflight to complete: 20220115181514083
        at 
org.apache.hudi.client.SparkRDDWriteClient.completeClustering(SparkRDDWriteClient.java:395)
        at 
org.apache.hudi.client.SparkRDDWriteClient.completeTableService(SparkRDDWriteClient.java:470)
        at 
org.apache.hudi.client.SparkRDDWriteClient.cluster(SparkRDDWriteClient.java:364)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.lambda$inlineCluster$15(AbstractHoodieWriteClient.java:1103)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.inlineCluster(AbstractHoodieWriteClient.java:1101)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.runTableServicesInline(AbstractHoodieWriteClient.java:478)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:206)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:176)
        at 
org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:83)
        ... 57 more
   Caused by: org.apache.spark.sql.AnalysisException: Unable to infer schema 
for Parquet. It must be specified manually.
        at 
org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$12(DataSource.scala:211)
        at scala.Option.getOrElse(Option.scala:189)
        at 
org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:211)
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:419)
        at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
        at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:239)
        at 
org.apache.hudi.index.zorder.ZOrderingIndexHelper.updateZIndexFor(ZOrderingIndexHelper.java:420)
        at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.updateZIndex(HoodieSparkCopyOnWriteTable.java:218)
        at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.updateMetadataIndexes(HoodieSparkCopyOnWriteTable.java:176)
        at 
org.apache.hudi.client.SparkRDDWriteClient.completeClustering(SparkRDDWriteClient.java:388)
        ... 66 more
   )
   22/01/15 18:40:17 INFO ApplicationMaster: Unregistering ApplicationMaster 
with FAILED (diag message: User class threw exception: 
org.apache.spark.SparkException: Writing job failed.
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:383)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:336)
        at 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec.writeWithV2(WriteToDataSourceV2Exec.scala:218)
        at 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec.run(WriteToDataSourceV2Exec.scala:225)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.doExecute(V2CommandExec.scala:55)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
        at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:991)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:991)
        at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:370)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:480)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:162)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
        at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:991)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:991)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
        at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
        at 
com.abc.de.common_traffic.dwd.pipeline.CIVHudiJobRunner.createPipeline(CIVHudiJobRunner.scala:63)
        at 
com.abc.de.common.spark.runtime.SparkJobRunner.run(SparkJobRunner.scala:68)
        at com.abc.de.common_traffic.dwd.CIVHudiMain$.main(CIVHudiMain.scala:17)
        at com.abc.de.common_traffic.dwd.CIVHudiMain.main(CIVHudiMain.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:743)
   Caused by: org.apache.hudi.exception.HoodieException: unable to transition 
clustering inflight to complete: 20220115181514083
        at 
org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:86)
        at 
org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite.commit(HoodieDataSourceInternalBatchWrite.java:93)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:371)
        ... 55 more
        Suppressed: org.apache.hudi.exception.HoodieRollbackException: Failed 
to rollback hdfs://R2/projects/data_trafficmart/hdfs/dev/hudi_z_order//view_di/ 
commits 20220115181353796
                at 
org.apache.hudi.client.AbstractHoodieWriteClient.rollback(AbstractHoodieWriteClient.java:655)
                at 
org.apache.hudi.client.AbstractHoodieWriteClient.rollback(AbstractHoodieWriteClient.java:597)
                at 
org.apache.hudi.internal.DataSourceInternalWriterHelper.abort(DataSourceInternalWriterHelper.java:94)
                at 
org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite.abort(HoodieDataSourceInternalBatchWrite.java:98)
                at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:378)
                ... 55 more
        Caused by: java.lang.IllegalArgumentException: Cannot use marker based 
rollback strategy on completed instant:[20220115181353796__commit__COMPLETED]
                at 
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
                at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.<init>(BaseRollbackActionExecutor.java:90)
                at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.<init>(BaseRollbackActionExecutor.java:71)
                at 
org.apache.hudi.table.action.rollback.CopyOnWriteRollbackActionExecutor.<init>(CopyOnWriteRollbackActionExecutor.java:48)
                at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.rollback(HoodieSparkCopyOnWriteTable.java:343)
                at 
org.apache.hudi.client.AbstractHoodieWriteClient.rollback(AbstractHoodieWriteClient.java:640)
                ... 59 more
   Caused by: org.apache.hudi.exception.HoodieClusteringException: unable to 
transition clustering inflight to complete: 20220115181514083
        at 
org.apache.hudi.client.SparkRDDWriteClient.completeClustering(SparkRDDWriteClient.java:395)
        at 
org.apache.hudi.client.SparkRDDWriteClient.completeTableService(SparkRDDWriteClient.java:470)
        at 
org.apache.hudi.client.SparkRDDWriteClient.cluster(SparkRDDWriteClient.java:364)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.lambda$inlineCluster$15(AbstractHoodieWriteClient.java:1103)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.inlineCluster(AbstractHoodieWriteClient.java:1101)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.runTableServicesInline(AbstractHoodieWriteClient.java:478)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:206)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:176)
        at 
org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:83)
        ... 57 more
   Caused by: org.apache.spark.sql.AnalysisException: Unable to infer schema 
for Parquet. It must be specified manually.
        at 
org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$12(DataSource.scala:211)
        at scala.Option.getOrElse(Option.scala:189)
        at 
org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:211)
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:419)
        at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
        at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:239)
        at 
org.apache.hudi.index.zorder.ZOrderingIndexHelper.updateZIndexFor(ZOrderingIndexHelper.java:420)
        at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.updateZIndex(HoodieSparkCopyOnWriteTable.java:218)
        at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.updateMetadataIndexes(HoodieSparkCopyOnWriteTable.java:176)
        at 
org.apache.hudi.client.SparkRDDWriteClient.completeClustering(SparkRDDWriteClient.java:388)
        ... 66 more
   )```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to