[GitHub] [flink] shouweikun edited a comment on pull request #14727: [FLINK-19945][Connectors / FileSystem]Support sink parallelism config…

GitBox Mon, 01 Feb 2021 03:30:06 -0800


shouweikun edited a comment on pull request #14727:
URL: https://github.com/apache/flink/pull/14727#issuecomment-770783211



   > Hi @shouweikun , I have went through the pull request. However, supporting 
sink parallelism for Hive and Filesytem is not just changing parallelism of the 
writer DataStream. We should first support `ParallelismProvider` for 
`DataStreamSinkProvider` first. Because if the sink parallelism is different 
than the upstream operator, we should implicitly add a keyby shuffle if there 
is changelog in the stream, otherwise the chvangelog will be out of order. See
   > 
   > 
https://github.com/apache/flink/blob/95257a255f0da0a95b31647c6d057914d5748871/flink-table/flink-table-planner-blink/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/common/CommonExecSink.java#L116
   
   Hi @wuchong, thanks for ur review and comment. As far as I’m concerned,  it 
shall be under well discussed whether `DataStreamSinkProvider` inherits 
`ParallelismProvider` or not. Referring to 
[FLIP-146](https://cwiki.apache.org/confluence/display/FLINK/FLIP-146%3A+Improve+new+TableSource+and+TableSink+interfaces),
 only `SinkFunctionProvider` and `OutputFormatProvider ` inherit 
`ParallelismProvider`.  Well, correct me if I'm wrong, both 
`DatastreamScanProvider` and `DataStreamSinkProvider` are designed for advanced 
user, which means that  once user choose `DataStreamProvider`, the more freedom 
the user has and the more responsibility the user take. What 's more, 
`DataStreamSinkProvider` inheriting `ParallelismProvider`  may still not 
guarantee the parallelism configured correctly, cuz user can do anything in 
`DataStreamSinkProvider#getRuntimeProvider`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] shouweikun edited a comment on pull request #14727: [FLINK-19945][Connectors / FileSystem]Support sink parallelism config…

Reply via email to