Hi all,

Thank you for starting the discussion. To start with I have to say I am
not entirely against leaving them. On the other hand I totally disagree
that the semantics are clearly defined. Actually the design is
fundamentally flawed.

 1. We use String as a selector for elements. This is not the cleanest
    design, but I agree it is not the worst.
 2. Users cannot define different types for different splits.
 3. (The actual reason why I think it's actually better to drop the
    split/select and introduce a better mechanism) The behavior of a
    split is to actually add an output selector. We can have just a
    single selector on a single operator, but the API allows (I would
    even say encourages) to create chains of split/select, which leads
    to undefined behavior. Take this for example: ds.split().select("a",
    "b").select("c", "d"). Which tags should be forwarded? ("a", "b",
    "c", "d") (union) or () (intersection). In my opinion the most
    obvious answer in this case would be the intersection. Let's modify
    it slightly though and I would assume a different behavior (the union)

           splitted = ds.split();

           splitted.select("a", "b").map()

           splitted.select("c", "d").map()

Taking the 3rd argument into consideration I would be in favor of
removing the current mechanism. I think the side outputs serve the
purpose much better with much cleaner semantics. I get the argument that
users are now forced to use processFunction if they want to use the side
outputs. If this is the main problem how about enabling them e.g. for
flatMap as well?

Best,

Dawid

On 17/06/2019 08:51, Jark Wu wrote:
> +1 to keep the split/select API. I think if there are some problems with
> the API, it's better to fix them instead of deprecating them.
> And select/split are straightforward and convenient APIs. It's worth to
> have them.
>
> Regards,
> Jark
>
> On Mon, 17 Jun 2019 at 14:46, vino yang <yanghua1...@gmail.com> wrote:
>
>> Hi,
>>
>> I also think it is valuable and reasonable to keep the split/select APIs.
>> They are very convenient and widely used in our platform. I think they are
>> also used in other users' jobs.
>> If the community has doubts about this, IMHO, it would be better to start a
>> user survey.
>>
>> Best,
>> Vino
>>
>> SHI Xiaogang <shixiaoga...@gmail.com> 于2019年6月17日周一 上午11:55写道:
>>
>>> Hi Xingcan,
>>>
>>> Thanks for bringing it up for discusson.
>>>
>>> I agree with you that we should not deprecate the split/select methods.
>>> Their semantics are very clear and they are widely adopted by Flink
>> users.
>>> We should fix these problems instead of simply deprecating the methods.
>>>
>>> Regards,
>>> Xiaogang
>>>
>>> Xingcan Cui <xingc...@gmail.com> 于2019年6月15日周六 下午4:13写道:
>>>
>>>> Hi all,
>>>>
>>>> Recently, I noticed that the split/select methods in DataStream API
>> have
>>>> been marked as deprecated since 1.7.2 and 1.8.0 (the related JIRA issue
>>>> FLINK-11084 <https://issues.apache.org/jira/browse/FLINK-11084>).
>>>>
>>>> Although the two methods can be replaced by the more powerful side
>> output
>>>> feature[1], I still doubt whether we should really remove them in the
>>>> future.
>>>>
>>>> 1. From semantics, the split/select is the reverse operation to the
>> union
>>>> transformation. Without them, the DataStream API seems to be missing a
>>>> piece.
>>>>
>>>> 2. From accessibility, the side output only works for process
>> functions,
>>>> which means it forces the user to dive into a lower API.
>>>>
>>>> According to FLINK-11084 <
>>>> https://issues.apache.org/jira/browse/FLINK-11084>, there exist some
>>>> problems with the current implementation of the two methods. Maybe we
>>>> should fix the problems and re-active them again. Or if they really
>> need
>>> to
>>>> be deprecated, we should at least mark the corresponding documentation
>>> for
>>>> that : )
>>>>
>>>> What do you think?
>>>>
>>>> Best,
>>>> Xingcan
>>>>
>>>> [1]
>>>>
>> https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/side_output.html
>>>> <
>>>>
>> https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/side_output.html

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to