shouweikun edited a comment on pull request #14727: URL: https://github.com/apache/flink/pull/14727#issuecomment-770783211
> Hi @shouweikun , I have went through the pull request. However, supporting sink parallelism for Hive and Filesytem is not just changing parallelism of the writer DataStream. We should first support `ParallelismProvider` for `DataStreamSinkProvider` first. Because if the sink parallelism is different than the upstream operator, we should implicitly add a keyby shuffle if there is changelog in the stream, otherwise the chvangelog will be out of order. See > > https://github.com/apache/flink/blob/95257a255f0da0a95b31647c6d057914d5748871/flink-table/flink-table-planner-blink/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/common/CommonExecSink.java#L116 Hi @wuchong, thanks for ur review and comment. As far as I’m concerned, it shall be under well discussed whether `DataStreamSinkProvider` inherits `ParallelismProvider` or not. Referring to [FLIP-146](https://cwiki.apache.org/confluence/display/FLINK/FLIP-146%3A+Improve+new+TableSource+and+TableSink+interfaces), only `SinkFunctionProvider` and `OutputFormatProvider ` inherit `ParallelismProvider`. Well, correct me if I'm wrong, both `DatastreamScanProvider` and `DataStreamSinkProvider` are designed for advanced user, which means that once user choose `DataStreamProvider`, the more freedom the user has and the more responsibility the user take. What 's more, `DataStreamSinkProvider` inheriting `ParallelismProvider` may still not guarantee the parallelism configured correctly, cuz user can do anything in `DataStreamSinkProvider#getRuntimeProvider`. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org