Re: Broadcast state vs data enrichment

Manas Kale Wed, 13 May 2020 23:29:24 -0700

I see, thank you Roman!

On Tue, May 12, 2020 at 4:59 PM Khachatryan Roman <
khachatryan.ro...@gmail.com> wrote:


> Thanks for the clarification.
>
> Apparently, the second option (with enricher) creates more load by adding
> configuration to every event. Unless events are much bigger than the
> configuration, this will significantly increase network, memory, and CPU
> usage.
> Btw, I think you don't need a broadcast in the 2nd option, since the
> interested subtask will receive the configuration anyways.
>
> Regards,
> Roman
>
>
> On Tue, May 12, 2020 at 5:57 AM Manas Kale <manaskal...@gmail.com> wrote:
>
>> Sure. Apologies for not making this clear enough.
>>
>> > each operator only stores what it needs.
>> Lets imagine this setup :
>>
>> BROADCAST STREAM
>> config-stream 
>> --------------------------------------------------------------------
>>                             |                           |                    
>>   |
>> event-stream----------> operator1------------------> operator2-------------> 
>> operator3
>>
>>
>> In this scenario, all 3 operators will be BroadcastProcessFunctions. Each
>> of them will receive the whole config message in their
>> processBroadcastElement method, but each one will only store what it
>> needs in their state store. So even though operator1 will receive
>>  config = {
>> "config1" : 1,
>> "config2" : 2,
>> "config3" : 3
>> }
>> it will only store config1.
>>
>> > each downstream operator will "strip off" the config parameter that it
>> needs.
>>
>> BROADCAST STREAM
>> config-stream -----------------
>>                               |
>> event-stream---------->  enricher --------------> 
>> operator1------------------> operator2-------------> operator3
>>
>> In this case, the enricher operator will store the whole config message.
>> When an event message arrives, this operator will append config1, config2
>> and config3 to it. Operator 1 will extract and use config1, and output a
>> message that has config1 stripped off.
>>
>> I hope that helps!
>>
>> Perhaps I am being too pedantic but I would like to know if these two
>> methods have comparable performance differences and if so which one would
>> be preferred.
>>
>>
>>
>>
>> On Mon, May 11, 2020 at 11:46 PM Khachatryan Roman <
>> khachatryan.ro...@gmail.com> wrote:
>>
>>> Hi Manas,
>>>
>>> The approaches you described looks the same:
>>> > each operator only stores what it needs.
>>> > each downstream operator will "strip off" the config parameter that it
>>> needs.
>>>
>>> Can you please explain the difference?
>>>
>>> Regards,
>>> Roman
>>>
>>>
>>> On Mon, May 11, 2020 at 8:07 AM Manas Kale <manaskal...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>> I have a single broadcast message that contains configuration data
>>>> consumed by different operators. For eg:
>>>> config = {
>>>> "config1" : 1,
>>>> "config2" : 2,
>>>> "config3" : 3
>>>> }
>>>>
>>>> Operator 1 will consume config1 only, operator 2 will consume config2
>>>> only etc.
>>>>
>>>>
>>>>    - Right now in my implementation the config message gets broadcast
>>>>    over operators 1,2,3 and each operator only stores what it needs.
>>>>
>>>>
>>>>    - A different approach would be to broadcast the config message to
>>>>    a single root operator. This will then enrich event data flowing 
>>>> through it
>>>>    with config1,config2 and config3 and each downstream operator will 
>>>> "strip
>>>>    off" the config parameter that it needs.
>>>>
>>>>
>>>> *I was wondering which approach would be the best to go with
>>>> performance wise. *I don't really have the time to implement both and
>>>> compare, so perhaps someone here already knows if one approach is better or
>>>> both provide similar performance.
>>>>
>>>> FWIW, the config stream is very sporadic compared to the event stream.
>>>>
>>>> Thank you,
>>>> Manas Kale
>>>>
>>>>
>>>>
>>>>

Re: Broadcast state vs data enrichment

Reply via email to