Lasse Dalegaard created FLINK-3637:
--------------------------------------

             Summary: Change RollingSink Writer interface to allow wider range 
of outputs
                 Key: FLINK-3637
                 URL: https://issues.apache.org/jira/browse/FLINK-3637
             Project: Flink
          Issue Type: Improvement
          Components: Streaming Connectors
            Reporter: Lasse Dalegaard


Currently the RollingSink Writer interface only works with FSDataOutputStreams, 
which precludes it from being used with some existing libraries like Apache ORC 
and Parquet.

To fix this, a new Writer interface can be created, which receives FileSystem 
and Path objects, instead of FSDataOutputStream.

To ensure exactly-once semantics, the Writer interface must also be extended so 
that the current write-offset can be retrieved at checkpointing time. For 
formats like ORC this requires a footer to be written, before the offset is 
returned. Checkpointing already calls flush on the writer, but either flush 
needs to return the current length of the output file, or alternatively a new 
method has to be added for this.

The existing Writer interface can be recreated with a wrapper on top of the new 
Writer interface. The existing code that manages the FSDataOutputStream can 
then be moved into this new wrapper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to