[ https://issues.apache.org/jira/browse/FLINK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818230#comment-15818230 ]
ASF GitHub Bot commented on FLINK-3637: --------------------------------------- Github user shashank734 commented on the issue: https://github.com/apache/flink/pull/1826 @dalegaard have you created Parquet writer for the same or can you give me idea how i can sink json ->parquet ->HDFS from datastream or streaming Table ?? > Change RollingSink Writer interface to allow wider range of outputs > ------------------------------------------------------------------- > > Key: FLINK-3637 > URL: https://issues.apache.org/jira/browse/FLINK-3637 > Project: Flink > Issue Type: Improvement > Components: Streaming Connectors > Reporter: Lasse Dalegaard > Assignee: Lasse Dalegaard > Labels: features > Fix For: 1.1.0 > > > Currently the RollingSink Writer interface only works with > FSDataOutputStreams, which precludes it from being used with some existing > libraries like Apache ORC and Parquet. > To fix this, a new Writer interface can be created, which receives FileSystem > and Path objects, instead of FSDataOutputStream. > To ensure exactly-once semantics, the Writer interface must also be extended > so that the current write-offset can be retrieved at checkpointing time. For > formats like ORC this requires a footer to be written, before the offset is > returned. Checkpointing already calls flush on the writer, but either flush > needs to return the current length of the output file, or alternatively a new > method has to be added for this. > The existing Writer interface can be recreated with a wrapper on top of the > new Writer interface. The existing code that manages the FSDataOutputStream > can then be moved into this new wrapper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)