Hi,

I've recently published a blog post about Broadcast State [1].

Cheers,
Fabian

[1]
https://data-artisans.com/blog/a-practical-guide-to-broadcast-state-in-apache-flink

2018-08-20 3:58 GMT+02:00 Paul Lam <paullin3...@gmail.com>:

> Hi Rong, Hequn
>
> Your answers are very helpful! Thank you!
>
> Best Regards,
> Paul Lam
>
> 在 2018年8月19日,23:30,Rong Rong <walter...@gmail.com> 写道:
>
> Hi Paul,
>
> To add to Hequn's answer. Broadcast state can typically be used as "a
> low-throughput stream containing a set of rules which we want to evaluate
> against all elements coming from another stream" [1]
> So to add to the difference list is: whether it is "broadcast" across all
> keys if processing a keyed stream. This is typically when it is not
> possible to derive same key field using KeySelector in CoStream.
> Another additional difference is performance: BroadcastStream is "stored
> locally and is used to process all incoming elements on the other stream"
> thus requires to carefully manage the size of the BroadcastStream.
>
> [1]: https://ci.apache.org/projects/flink/flink-docs-
> release-1.6/dev/stream/state/broadcast_state.html
>
> On Sun, Aug 19, 2018 at 1:40 AM Hequn Cheng <chenghe...@gmail.com> wrote:
>
>> Hi Paul,
>>
>> There are some differences:
>> 1. The BroadcastStream can broadcast data for you, i.e, data will be
>> broadcasted to all downstream tasks automatically.
>> 2. To guarantee that the contents in the Broadcast State are the same
>> across all parallel instances of our operator, read-write access is only
>> given to the broadcast side
>> 3. For BroadcastState, flink guarantees that upon restoring/rescaling
>> there will be no duplicates and no missing data. In case of recovery with
>> the same or smaller parallelism, each task reads its checkpointed state.
>> Upon scaling up, each task reads its own state, and the remaining tasks
>> (p_new-p_old) read checkpoints of previous tasks in a round-robin manner.
>> While MapState doesn't have such abilities.
>>
>> Best, Hequn
>>
>> On Sun, Aug 19, 2018 at 11:18 AM, Paul Lam <paullin3...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> AFAIK, the difference between a BroadcastStream and a normal DataStream
>>> is that the BroadcastStream is with a BroadcastState, but it seems that the
>>> functionality of BroadcastState can also be achieved by MapState in a
>>> CoMapFunction or something since the control stream is still broadcasted
>>> without being turned into BroadcastStream. So, I’m wondering what’s the
>>> advantage of using BroadcastState? Thanks a lot!
>>>
>>> Best Regards,
>>> Paul Lam
>>>
>>
>>
>

Reply via email to