I see, thank you Roman! On Tue, May 12, 2020 at 4:59 PM Khachatryan Roman < khachatryan.ro...@gmail.com> wrote:
> Thanks for the clarification. > > Apparently, the second option (with enricher) creates more load by adding > configuration to every event. Unless events are much bigger than the > configuration, this will significantly increase network, memory, and CPU > usage. > Btw, I think you don't need a broadcast in the 2nd option, since the > interested subtask will receive the configuration anyways. > > Regards, > Roman > > > On Tue, May 12, 2020 at 5:57 AM Manas Kale <manaskal...@gmail.com> wrote: > >> Sure. Apologies for not making this clear enough. >> >> > each operator only stores what it needs. >> Lets imagine this setup : >> >> BROADCAST STREAM >> config-stream >> -------------------------------------------------------------------- >> | | >> | >> event-stream----------> operator1------------------> operator2-------------> >> operator3 >> >> >> In this scenario, all 3 operators will be BroadcastProcessFunctions. Each >> of them will receive the whole config message in their >> processBroadcastElement method, but each one will only store what it >> needs in their state store. So even though operator1 will receive >> config = { >> "config1" : 1, >> "config2" : 2, >> "config3" : 3 >> } >> it will only store config1. >> >> > each downstream operator will "strip off" the config parameter that it >> needs. >> >> BROADCAST STREAM >> config-stream ----------------- >> | >> event-stream----------> enricher --------------> >> operator1------------------> operator2-------------> operator3 >> >> In this case, the enricher operator will store the whole config message. >> When an event message arrives, this operator will append config1, config2 >> and config3 to it. Operator 1 will extract and use config1, and output a >> message that has config1 stripped off. >> >> I hope that helps! >> >> Perhaps I am being too pedantic but I would like to know if these two >> methods have comparable performance differences and if so which one would >> be preferred. >> >> >> >> >> On Mon, May 11, 2020 at 11:46 PM Khachatryan Roman < >> khachatryan.ro...@gmail.com> wrote: >> >>> Hi Manas, >>> >>> The approaches you described looks the same: >>> > each operator only stores what it needs. >>> > each downstream operator will "strip off" the config parameter that it >>> needs. >>> >>> Can you please explain the difference? >>> >>> Regards, >>> Roman >>> >>> >>> On Mon, May 11, 2020 at 8:07 AM Manas Kale <manaskal...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> I have a single broadcast message that contains configuration data >>>> consumed by different operators. For eg: >>>> config = { >>>> "config1" : 1, >>>> "config2" : 2, >>>> "config3" : 3 >>>> } >>>> >>>> Operator 1 will consume config1 only, operator 2 will consume config2 >>>> only etc. >>>> >>>> >>>> - Right now in my implementation the config message gets broadcast >>>> over operators 1,2,3 and each operator only stores what it needs. >>>> >>>> >>>> - A different approach would be to broadcast the config message to >>>> a single root operator. This will then enrich event data flowing >>>> through it >>>> with config1,config2 and config3 and each downstream operator will >>>> "strip >>>> off" the config parameter that it needs. >>>> >>>> >>>> *I was wondering which approach would be the best to go with >>>> performance wise. *I don't really have the time to implement both and >>>> compare, so perhaps someone here already knows if one approach is better or >>>> both provide similar performance. >>>> >>>> FWIW, the config stream is very sporadic compared to the event stream. >>>> >>>> Thank you, >>>> Manas Kale >>>> >>>> >>>> >>>>