Hi Nikola, side outputs definitively are at least as efficient as using two filters but they are also harder to implement and maintain. Do you actually have a use case where every bit of performance counts?
If so, please also check enableObjectReuse [1] and look into serialization [2]. Also if you can implement your use case with Table API/SQL (with UDFs), it will be much faster than other alternatives. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/execution/execution_configuration/ [2] https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html On Mon, May 10, 2021 at 4:52 PM Taher Koitawala <taher...@gmail.com> wrote: > I think what your looking for is a a side output. Change the logic to a > process function. What is true goes to collector false can go to a side > output. Which now gives you 2 streams > > On Mon, May 10, 2021, 8:14 PM Nikola Hrusov <n.hru...@gmail.com> wrote: > >> Hi Arvid, >> >> In my case it's the latter, thus I have also thought about using the >> filter (map is not useful in my case). >> >> What I am not sure which is better to be used? >> In what case would you split a stream with side output and in what case >> with filter? >> Would there be any performance gain/pain based on which is used? >> >> Regards >> , >> Nikola >> <%28%2B45%29%2060%2054%2032%2016> >> >> >> On Mon, May 10, 2021 at 6:00 PM Arvid Heise <ar...@apache.org> wrote: >> >>> Hi Nikola, >>> >>> if you just want to apply a different user function to the records >>> depending on the property "exist" the simplest way is to use >>> >>> source -> map(if exist do this else that) -> sink >>> >>> If it turns out that you want to apply a different subgraph, you can do >>> >>> source -> filter(if exist) -> do this -> union -> sink >>> source -> filter(if not exist) -> do that -^ >>> >>> On Mon, May 10, 2021 at 3:07 PM Nikola Hrusov <n.hru...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I am trying to find some information on what is the best way to split a >>>> stream of the same data. >>>> >>>> For the given scenario: I have an object which has a property "exist" >>>> >>>> I want to split the stream based on this property, do something, and >>>> afterwards join it again into a single stream. >>>> >>>> Initial (A) -> Split stream based on exist (B) or not (C) -> union both >>>> streams (D) >>>> >>>> I could find some similar topics on StackOverflow: >>>> - >>>> https://stackoverflow.com/questions/53588554/apache-flink-using-filter-or-split-to-split-a-stream >>>> - >>>> https://stackoverflow.com/questions/61752728/how-to-get-output-of-the-values-that-are-not-matched-in-filter-function-in-apach >>>> >>>> but none of them really gives a definitive answer. >>>> >>>> What I am thinking about is using 1) filter or 2) side output. >>>> >>>> I know that one of the use cases of side output is that it can have >>>> different data types. That is not my case as it will be the same object >>>> going through the whole pipeline. >>>> >>>> So both options look more or less the same to me, however I do not know >>>> the flink internals as good as I would like to as of this point. >>>> >>>> Can some of you guys shed some light and perhaps tell me if I am >>>> mistaken in my thoughts? >>>> >>>> Thanks. >>>> >>>> Regards >>>> , >>>> Nikola >>>> >>>