[ https://issues.apache.org/jira/browse/FLINK-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728770#comment-14728770 ]
ASF GitHub Bot commented on FLINK-2583: --------------------------------------- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1084#issuecomment-137393554 I think using truncate for exactly once is the way to go. To support users with older HDFS versions, how about this: 1. We consider only valid what was written successfully at a checkpoint (hflush/hsync). When we roll over to a new file on restart, we write a `.length` file for that other file that indicates how many bytes are valid in that file. Basically simulating truncate by adding a metadata file. 2. Optionally, the user can activate a merge-on roll-over, that takes all the files from the attempts and all the metadata files, and merges them into one file. This rollover can be written such that it works incrementally and re-tries on failures, etc... > Add Stream Sink For Rolling HDFS Files > -------------------------------------- > > Key: FLINK-2583 > URL: https://issues.apache.org/jira/browse/FLINK-2583 > Project: Flink > Issue Type: New Feature > Components: Streaming > Reporter: Aljoscha Krettek > Assignee: Aljoscha Krettek > Fix For: 0.10 > > > In addition to having configurable file-rolling behavior the Sink should also > integrate with checkpointing to make it possible to have exactly-once > semantics throughout the topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)