[ https://issues.apache.org/jira/browse/FLINK-23725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403519#comment-17403519 ]
Paul Lin commented on FLINK-23725: ---------------------------------- [~todd5167] You're right. The commit operations should be retryable and idempotent. IIUC, you mean overriding the already existed target file if there's a conflict? I think that would be a problem if two files are written by different executions without consistent states (e.g start a new job without savepoints while keeping the files of previous executions). How about this: * if either the source file or the target file exists, ignore the results; * if none of them exists, there's a data loss, thus throw an exception; * if both of them exist, there's a name conflict, thus throw an exception. > HadoopFsCommitter, file rename failure > -------------------------------------- > > Key: FLINK-23725 > URL: https://issues.apache.org/jira/browse/FLINK-23725 > Project: Flink > Issue Type: Bug > Components: Connectors / FileSystem, Connectors / Hadoop > Compatibility, FileSystems > Affects Versions: 1.11.1, 1.12.1 > Reporter: todd > Priority: Major > > When the HDFS file is written, if the part file exists, only false will be > returned if the duplicate name fails.Whether to throw an exception that > already exists in the part, or print related logs. > > ``` > org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.HadoopFsCommitter#commit > public void commit() throws IOException { > final Path src = recoverable.tempFile(); > final Path dest = recoverable.targetFile(); > final long expectedLength = recoverable.offset(); > try { > //always return false or ture > fs.rename(src, dest); > } catch (IOException e) { > throw new IOException( > "Committing file by rename failed: " + src + " to " + dest, e); > } > } > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)