[ 
https://issues.apache.org/jira/browse/FLINK-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728770#comment-14728770
 ] 

ASF GitHub Bot commented on FLINK-2583:
---------------------------------------

Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1084#issuecomment-137393554
  
    I think using truncate for exactly once is the way to go. To support users 
with older HDFS versions, how about this:
    
    1. We consider only valid what was written successfully at a checkpoint 
(hflush/hsync). When we roll over to a new file on restart, we write a 
`.length` file for that other file that indicates how many bytes are valid in 
that file. Basically simulating truncate by adding a metadata file.
    
    2. Optionally, the user can activate a merge-on roll-over, that takes all 
the files from the attempts and all the metadata files, and merges them into 
one file. This rollover can be written such that it works incrementally and 
re-tries on failures, etc...



> Add Stream Sink For Rolling HDFS Files
> --------------------------------------
>
>                 Key: FLINK-2583
>                 URL: https://issues.apache.org/jira/browse/FLINK-2583
>             Project: Flink
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: Aljoscha Krettek
>            Assignee: Aljoscha Krettek
>             Fix For: 0.10
>
>
> In addition to having configurable file-rolling behavior the Sink should also 
> integrate with checkpointing to make it possible to have exactly-once 
> semantics throughout the topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to