I think this would benefit from a FLIP, that neatly sums up the options, and which then gives us also a point where we can vote and ratify a decision.
As a gut feeling, I most like Option 3). Initially I would have preferred option 1) (because of a sense of API purity), but by now I think it’s good that users have this simpler option. Aljoscha > On 8. Jul 2019, at 06:39, Xingcan Cui <xingc...@gmail.com> wrote: > > Hi all, > > Thanks for your participation. > > In this thread, we got one +1 for option 1 and option 3, respectively. In the > original thread[1], we got two +1 for option 1, one +1 for option 2, and five > +1 and one -1 for option 3. > > To summarize, > > Option 1 (port side output to flatMap and deprecate split/select): three +1 > Option 2 (introduce a new split/select and deprecate existing one): one +1 > Option 3 ("correct" the existing split/select): six +1 and one -1 > > It seems that most people involved are in favor of "correcting" the existing > split/select. However, this will definitely break the API compatibility, in a > subtle way. > > IMO, the real behavior of consecutive split/select's has never been > thoroughly clarified. Even in the community, it hard to say that we come into > a consensus on its real semantics[2-4]. Though the initial design is not > ambiguous, there's no doubt that its concept has drifted. > > As the split/select is quite an ancient API, I cc'ed this to more members. It > couldn't be better if you can share your opinions on this. > > Thanks, > Xingcan > > [1] > https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E > > <https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E> > [2] https://issues.apache.org/jira/browse/FLINK-1772 > <https://issues.apache.org/jira/browse/FLINK-1772> > [3] https://issues.apache.org/jira/browse/FLINK-5031 > <https://issues.apache.org/jira/browse/FLINK-5031> > [4] https://issues.apache.org/jira/browse/FLINK-11084 > <https://issues.apache.org/jira/browse/FLINK-11084> > > >> On Jul 5, 2019, at 12:04 AM, 杨力 <bill.le...@gmail.com >> <mailto:bill.le...@gmail.com>> wrote: >> >> I prefer the 1) approach. I used to carry fields, which is needed only for >> splitting, in the outputs of flatMap functions. Replacing it with outputTags >> would simplify data structures. >> >> Xingcan Cui <xingc...@gmail.com <mailto:xingc...@gmail.com> >> <mailto:xingc...@gmail.com <mailto:xingc...@gmail.com>>> 于 2019年7月5日周五 >> 上午2:20写道: >> Hi folks, >> >> Two weeks ago, I started a thread [1] discussing whether we should discard >> the split/select methods (which have been marked as deprecation since v1.7) >> in DataStream API. >> >> The fact is, these methods will cause "unexpected" results when using >> consecutively (e.g., ds.split(a).select(b).split(c).select(d)) or >> multi-times on the same target (e.g., ds.split(a).select(b), >> ds.split(c).select(d)). The reason is that following the initial design, the >> new split/select logic will always override the existing one on the same >> target operator, rather than append to it. Some users may not be aware of >> that, but if you do, a current solution would be to use the more powerful >> side output feature [2]. >> >> FLINK-11084 <https://issues.apache.org/jira/browse/FLINK-11084> added some >> restrictions to the existing split/select logic and suggest to replace it >> with side output in the future. However, considering that the side output is >> currently only available in the process function layer and the split/select >> could have been widely used in many real-world applications, we'd like to >> start a vote andlisten to the community on how to deal with them. >> >> In the discussion thread [1], we proposed three solutions as follows. All of >> them are feasible but have different impacts on the public API. >> >> 1) Port the side output feature to DataStream API's flatMap and replace >> split/select with it. >> >> 2) Introduce a dedicated function in DataStream API (with the "correct" >> behavior but a different name) that can be used to replace the existing >> split/select. >> >> 3) Keep split/select but change the behavior/semantic to be "correct". >> >> Note that this is just a vote for gathering information, so feel free to >> participate and share your opinions. >> >> The voting time will end on July 7th 17:00 EDT. >> >> Thanks, >> Xingcan >> >> [1] >> https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E >> >> <https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E><https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E >> >> <https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E>> >> [2] >> https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/side_output.html >> >> <https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/side_output.html> >> >> <https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/side_output.html >> >> <https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/side_output.html>>