[ 
https://issues.apache.org/jira/browse/FLINK-23725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402650#comment-17402650
 ] 

Paul Lin edited comment on FLINK-23725 at 8/21/21, 5:18 PM:
------------------------------------------------------------

I've also met this issue. If the file name already exists, FileCommiter would 
silently skip the commit, which may lead to data loss.

The root cause is that #rename would not throw exceptions if the target file 
already exists or the src file doesn't exist, instead it returns false to 
indicate the operation is failed, as [Hadoop 
ClientProtocal|https://github.com/apache/hadoop/blob/b6d19718204af02da6e2ed0b83d5936824371fc0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java#L520)]
 mentioned.

I think in both cases we should throw an exception.


was (Author: paul lin):
I've also met this issue. If the file name already exists, FileCommiter would 
silently skip the commit, which may lead to data loss.

The root cause is that #rename would not throw exceptions if the target file 
already exists or the src file doesn't exist, instead it returns false to 
indicate the operation is failed, as [Hadoop 
ClientProtocal|[https://github.com/apache/hadoop/blob/b6d19718204af02da6e2ed0b83d5936824371fc0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java#L520]|https://github.com/apache/hadoop/blob/b6d19718204af02da6e2ed0b83d5936824371fc0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java#L520)]
 mentioned.

I think in both cases we should throw an exception.

> HadoopFsCommitter, file rename failure
> --------------------------------------
>
>                 Key: FLINK-23725
>                 URL: https://issues.apache.org/jira/browse/FLINK-23725
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / FileSystem, Connectors / Hadoop 
> Compatibility, FileSystems
>    Affects Versions: 1.11.1, 1.12.1
>            Reporter: todd
>            Priority: Major
>
> When the HDFS file is written, if the part file exists, only false will be 
> returned if the duplicate name fails.Whether to throw an exception that 
> already exists in the part, or print related logs.
>  
> ```
> org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.HadoopFsCommitter#commit
> public void commit() throws IOException {
>  final Path src = recoverable.tempFile();
>  final Path dest = recoverable.targetFile();
>  final long expectedLength = recoverable.offset();
>  try {
>      //always   return false or ture
>     fs.rename(src, dest);
>  } catch (IOException e) {
>  throw new IOException(
>  "Committing file by rename failed: " + src + " to " + dest, e);
>  }
> }
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to