[ https://issues.apache.org/jira/browse/FLINK-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397229#comment-15397229 ]
ASF GitHub Bot commented on FLINK-4271: --------------------------------------- GitHub user wuchong opened a pull request: https://github.com/apache/flink/pull/2305 [FLINK-4271] [DataStreamAPI] Enable CoGroupedStreams and JoinedStreams to set parallellism Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html). In addition to going through the list, please provide a meaningful description of your changes. - [x] General - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text") - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message (including the JIRA id) - [ ] Documentation - Documentation has been added for new functionality - Old documentation affected by the pull request has been updated - JavaDoc for public methods has been added - [x] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed The CoGroupStream will construct the following graph. ``` source -> MAP --- |-> WindowOp -> Sink source -> MAP --- ``` By now , the MAP and WindowOp can not set parallelism. We can keep the MAP has same parallelism as previous operator (chaining). And we can change {{CoGroupedStreams.apply}} to return a {{SingleOutputStreamOperator}} instead of {{DataStream}}, so that we can set WindowOp's parallelism. The same thing has be done to {{JoinedStream}}. So that we can do the following things: ``` DataStream<T> result = one.coGroup(two) .where(new MyFirstKeySelector()) .equalTo(new MyFirstKeySelector()) .window(TumblingEventTimeWindows.of(Time.of(5, TimeUnit.SECONDS))) .apply(new MyCoGroupFunction()); .setParallelism(10) .name("MyCoGroupWindow") ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/wuchong/flink CoGroupStreams Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2305.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2305 ---- commit 7b9594a175f33e62826a0cb51380f33dec5857b6 Author: Jark Wu <wuchong...@alibaba-inc.com> Date: 2016-07-28T06:32:13Z [FLINK-4271] [DataStreamAPI] Enable CoGroupedStreams and JoinedStreams to set parallelism. ---- > There is no way to set parallelism of operators produced by CoGroupedStreams > ---------------------------------------------------------------------------- > > Key: FLINK-4271 > URL: https://issues.apache.org/jira/browse/FLINK-4271 > Project: Flink > Issue Type: Bug > Components: DataStream API > Reporter: Wenlong Lyu > Assignee: Jark Wu > > Currently, CoGroupStreams package the map/keyBy/window operators with a human > friendly interface, like: > dataStreamA.cogroup(streamB).where(...).equalsTo().window().apply(), both the > intermediate operators and final window operators can not be accessed by > users, and we cannot set attributes of the operators, which make co-group > hard to use in production environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)