Hi all, Thanks for your participation.
In this thread, we got one +1 for option 1 and option 3, respectively. In the original thread[1], we got two +1 for option 1, one +1 for option 2, and five +1 and one -1 for option 3. To summarize, Option 1 (port side output to flatMap and deprecate split/select): three +1 Option 2 (introduce a new split/select and deprecate existing one): one +1 Option 3 ("correct" the existing split/select): six +1 and one -1 It seems that most people involved are in favor of "correcting" the existing split/select. However, this will definitely break the API compatibility, in a subtle way. IMO, the real behavior of consecutive split/select's has never been thoroughly clarified. Even in the community, it hard to say that we come into a consensus on its real semantics[2-4]. Though the initial design is not ambiguous, there's no doubt that its concept has drifted. As the split/select is quite an ancient API, I cc'ed this to more members. It couldn't be better if you can share your opinions on this. Thanks, Xingcan [1] https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E [2] https://issues.apache.org/jira/browse/FLINK-1772 [3] https://issues.apache.org/jira/browse/FLINK-5031 [4] https://issues.apache.org/jira/browse/FLINK-11084 > On Jul 5, 2019, at 12:04 AM, 杨力 <bill.le...@gmail.com> wrote: > > I prefer the 1) approach. I used to carry fields, which is needed only for > splitting, in the outputs of flatMap functions. Replacing it with outputTags > would simplify data structures. > > Xingcan Cui <xingc...@gmail.com <mailto:xingc...@gmail.com>> 于 2019年7月5日周五 > 上午2:20写道: > Hi folks, > > Two weeks ago, I started a thread [1] discussing whether we should discard > the split/select methods (which have been marked as deprecation since v1.7) > in DataStream API. > > The fact is, these methods will cause "unexpected" results when using > consecutively (e.g., ds.split(a).select(b).split(c).select(d)) or multi-times > on the same target (e.g., ds.split(a).select(b), ds.split(c).select(d)). The > reason is that following the initial design, the new split/select logic will > always override the existing one on the same target operator, rather than > append to it. Some users may not be aware of that, but if you do, a current > solution would be to use the more powerful side output feature [2]. > > FLINK-11084 <https://issues.apache.org/jira/browse/FLINK-11084> added some > restrictions to the existing split/select logic and suggest to replace it > with side output in the future. However, considering that the side output is > currently only available in the process function layer and the split/select > could have been widely used in many real-world applications, we'd like to > start a vote andlisten to the community on how to deal with them. > > In the discussion thread [1], we proposed three solutions as follows. All of > them are feasible but have different impacts on the public API. > > 1) Port the side output feature to DataStream API's flatMap and replace > split/select with it. > > 2) Introduce a dedicated function in DataStream API (with the "correct" > behavior but a different name) that can be used to replace the existing > split/select. > > 3) Keep split/select but change the behavior/semantic to be "correct". > > Note that this is just a vote for gathering information, so feel free to > participate and share your opinions. > > The voting time will end on July 7th 17:00 EDT. > > Thanks, > Xingcan > > [1] > https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E > > <https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E> > [2] > https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/side_output.html > > <https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/side_output.html>