Hi folks, Two weeks ago, I started a thread [1] discussing whether we should discard the split/select methods (which have been marked as deprecation since v1.7) in DataStream API.
The fact is, these methods will cause "unexpected" results when using consecutively (e.g., ds.split(a).select(b).split(c).select(d)) or multi-times on the same target (e.g., ds.split(a).select(b), ds.split(c).select(d)). The reason is that following the initial design, the new split/select logic will always override the existing one on the same target operator, rather than append to it. Some users may not be aware of that, but if you do, a current solution would be to use the more powerful side output feature [2]. FLINK-11084 <https://issues.apache.org/jira/browse/FLINK-11084> added some restrictions to the existing split/select logic and suggest to replace it with side output in the future. However, considering that the side output is currently only available in the process function layer and the split/select could have been widely used in many real-world applications, we'd like to start a vote andlisten to the community on how to deal with them. In the discussion thread [1], we proposed three solutions as follows. All of them are feasible but have different impacts on the public API. 1) Port the side output feature to DataStream API's flatMap and replace split/select with it. 2) Introduce a dedicated function in DataStream API (with the "correct" behavior but a different name) that can be used to replace the existing split/select. 3) Keep split/select but change the behavior/semantic to be "correct". Note that this is just a vote for gathering information, so feel free to participate and share your opinions. The voting time will end on *July 7th 17:00 EDT*. Thanks, Xingcan [1] https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E [2] https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/side_output.html