Hi folks,

Two weeks ago, I started a thread [1] discussing whether we should discard
the split/select methods (which have been marked as deprecation since v1.7)
in DataStream API.

The fact is, these methods will cause "unexpected" results when using
consecutively (e.g., ds.split(a).select(b).split(c).select(d)) or
multi-times on the same target (e.g., ds.split(a).select(b),
ds.split(c).select(d)). The reason is that following the initial design,
the new split/select logic will always override the existing one on the
same target operator, rather than append to it. Some users may not be aware
of that, but if you do, a current solution would be to use the more
powerful side output feature [2].

FLINK-11084 <https://issues.apache.org/jira/browse/FLINK-11084> added some
restrictions to the existing split/select logic and suggest to replace it with
side output in the future. However, considering that the side output is
currently only available in the process function layer and the split/select
could have been widely used in many real-world applications, we'd like to start
a vote andlisten to the community on how to deal with them.

In the discussion thread [1], we proposed three solutions as follows. All
of them are feasible but have different impacts on the public API.

1) Port the side output feature to DataStream API's flatMap and replace
split/select with it.

2) Introduce a dedicated function in DataStream API (with the "correct"
behavior but a different name) that can be used to replace the existing
split/select.

3) Keep split/select but change the behavior/semantic to be "correct".

Note that this is just a vote for gathering information, so feel free to
participate and share your opinions.

The voting time will end on *July 7th 17:00 EDT*.

Thanks,
Xingcan

[1]
https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E
[2]
https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/side_output.html

Reply via email to