[ 
https://issues.apache.org/jira/browse/FLINK-18627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166290#comment-17166290
 ] 

Roey Shem Tov edited comment on FLINK-18627 at 7/28/20, 11:16 AM:
------------------------------------------------------------------

[~aljoscha],

The semantic improvment is making it easier to get FilteredRecord into side 
output.

 

in this example (I changed it a little bit for the semantic of the PR):
{code:java}
final OutputTag<String> curruptedData = new OutputTag<>("side-output"){};

SingleOutputStreamOperator stream = datastream
.filter(i->i%2==0,curruptedData)
.filter(i->i%3==0,curruptedData)
.filter(i->i%4==0,curruptedData)
.filter(i->i%5==0,curruptedData);

DataStream curruptedDataStream = stream.getSideOutput(curruptedData); // All 
data that doesn't divide at (2,3,4,5) together.{code}
And in the above case i have a new stream with all the curruptedData.

Offcourse the currupted data is only one example, there is more examples i can 
share.

 

I agree that filter should be filtering data, but it is NiceToHave feature that 
all the Filtered data will go to given outputTag instead just drop it.

Offcourse you can implement it by your self (extending RichFilterFunction and 
send all the Filtered Data into given output), but I think
 it is a nice wrapper that will be useful.


was (Author: roeyshemtov):
[~aljoscha],



The semantic improvment is making it easier to get FilteredRecord into side 
output.

 

in this example (I changed it a little bit for the semantic of the PR):
{code:java}
final OutputTag<String> curruptedData = new OutputTag<Integer>("side-output"){};

SingleOutputStreamOperator stream = datastream
.filter(i->i%2==0,curruptedData)
.filter(i->i%3==0,curruptedData)
.filter(i->i%4==0,curruptedData)
.filter(i->i%5==0,curruptedData);

DataStream curruptedDataStream = stream.getSideOutput(curruptedData); // All 
data that doesn't divide at (2,3,4,5) together.{code}
And in the above case i have a new stream with all the curruptedData.

Offcourse the currupted data is only one example, there is more examples i can 
share.

 

I agree that filter should be filtering data, but it is NiceToHave feature that 
all the Filtered data will go to given outputTag instead just drop it.

Offcourse you can implement it by your self (extending RichFilterFunction and 
send all the Filtered Data into given output), but I think
it is a nice wrapper that will be useful.

> Get unmatch filter method records to side output
> ------------------------------------------------
>
>                 Key: FLINK-18627
>                 URL: https://issues.apache.org/jira/browse/FLINK-18627
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / DataStream
>            Reporter: Roey Shem Tov
>            Priority: Major
>             Fix For: 1.12.0
>
>
> Unmatch records to filter functions should send somehow to side output.
> Example:
>  
> {code:java}
> datastream
> .filter(i->i%2==0)
> .sideOutput(oddNumbersSideOutput);
> {code}
>  
>  
> That's way we can filter multiple times and send the filtered records to our 
> side output instead of dropping it immediatly, it can be useful in many ways.
>  
> What do you think?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to