abhijeetkushe opened a new issue #4831:
URL: https://github.com/apache/hudi/issues/4831


   **Describe the problem you faced**
   
   We have been running Hudi Delta streamer on emr-5.31.0 (details below) for 
more than a year in produce and the dataset has 45 TB of data.The data set was 
out of sync for almost entire month of Feb (Last commit was Feb 1 19:05 UTC).So 
to restore it back to present day we decided to write 15 days worth of data in 
1 run.
   
   But the job taking a long time (4 hrs) and I decided to kill the job and 
start a new EMR with more executors.After restarting the job I found that the 
job resumes from the Feb 1 19:05 UTC checkpoint but immediately stops all the 
executors.The job also has an **commit.requested** and **inflight** present in 
the .hoodie folder.I tried deleting both **commit.requested** and **inflight** 
files but I still get the same behavior.Can I use the Hudi Cli to restore the 
Hudi table back to the last successful commit and start from that checkpoint
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Run the hudi deltasteamer on emr-5.31.0 on 1 Master and 4 core both 
m5.4xlarge with 12 executors with 5g memory 
   and spark.executor.cores as 4 and spark.task.cpus: 1. - I can provide more 
details if needed but this configuration has been
   tested and has been running efficiently for a 1 year.The number of executors 
was very low 
   2. Read 178,536 files of size 1000.7 GB and write to hudi table (Files 
571,478) 45.0 TB
   3. Kill the job in 4 hours
   4. Restart hudi deltastreamer with 28 executors same configuration as Step 1
   
   **Expected behavior**
   
   I expected the hudi deltastreamer will rollback previous inflight commit and 
start from Feb 1 checkpoint and write all 1000.7 GB files successfully 
   
   **Environment Description**
   
   Hudi version : 0.6.0
   
   Spark version : 2.4.6
   
   Hive version : 2.3.7
   
   Hadoop version : Amazon 2.10.0
   
   Storage (HDFS/S3/GCS..) : S3
   
   Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   I did see this issue 
[https://github.com/apache/hudi/issues/2072|https://github.com/apache/hudi/issues/2072]
 which refers to
   using Hudi Cli to create a save-point and reset it back to that point.Is 
that possible
   
   **Stacktrace**
   
   I did not see any exception in log.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to