Sparsamkeit opened a new issue, #10988:
URL: https://github.com/apache/hudi/issues/10988

   **Describe the problem you faced**
   
   After Flink fails to read the Hudi MOR table task and restarts after a 
period of time, an exception will occur that the log file does not exist.
   
   This may be because the log files have been merged into the parquet file.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.Flink task read Hudi MOR table by sql, enable savepoint.
   2.Stop read task, write task still write.
   3.Recover Flink task by savepoint.
   
   **Environment Description**
   
   * Hudi version : 0.13.1
   
   * Flink version : 1.14.5
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   **Additional context**
   
   
https://github.com/apache/hudi/blob/016bcf769b6ade87aa551f81432b22a09799b339/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/StreamReadOperator.java#L140
   
   In StreamReadOperator, all pending splits are saved to State when taking a 
snapshot, and the splits contain the log path. After merging log files into 
parquet files, an exception occurs
   
   
![Clip_2024-04-10_13-24-56](https://github.com/apache/hudi/assets/46555471/53638e00-aaa7-40a3-bb99-be92d67207fe)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to