[ 
https://issues.apache.org/jira/browse/FLINK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818230#comment-15818230
 ] 

ASF GitHub Bot commented on FLINK-3637:
---------------------------------------

Github user shashank734 commented on the issue:

    https://github.com/apache/flink/pull/1826
  
    @dalegaard have you created Parquet writer for the same or can you give me 
idea how i can sink json ->parquet ->HDFS from datastream or streaming Table ??


> Change RollingSink Writer interface to allow wider range of outputs
> -------------------------------------------------------------------
>
>                 Key: FLINK-3637
>                 URL: https://issues.apache.org/jira/browse/FLINK-3637
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming Connectors
>            Reporter: Lasse Dalegaard
>            Assignee: Lasse Dalegaard
>              Labels: features
>             Fix For: 1.1.0
>
>
> Currently the RollingSink Writer interface only works with 
> FSDataOutputStreams, which precludes it from being used with some existing 
> libraries like Apache ORC and Parquet.
> To fix this, a new Writer interface can be created, which receives FileSystem 
> and Path objects, instead of FSDataOutputStream.
> To ensure exactly-once semantics, the Writer interface must also be extended 
> so that the current write-offset can be retrieved at checkpointing time. For 
> formats like ORC this requires a footer to be written, before the offset is 
> returned. Checkpointing already calls flush on the writer, but either flush 
> needs to return the current length of the output file, or alternatively a new 
> method has to be added for this.
> The existing Writer interface can be recreated with a wrapper on top of the 
> new Writer interface. The existing code that manages the FSDataOutputStream 
> can then be moved into this new wrapper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to