Hi Paul,

There are some differences:
1. The BroadcastStream can broadcast data for you, i.e, data will be
broadcasted to all downstream tasks automatically.
2. To guarantee that the contents in the Broadcast State are the same
across all parallel instances of our operator, read-write access is only
given to the broadcast side
3. For BroadcastState, flink guarantees that upon restoring/rescaling there
will be no duplicates and no missing data. In case of recovery with the
same or smaller parallelism, each task reads its checkpointed state. Upon
scaling up, each task reads its own state, and the remaining tasks
(p_new-p_old) read checkpoints of previous tasks in a round-robin manner.
While MapState doesn't have such abilities.

Best, Hequn

On Sun, Aug 19, 2018 at 11:18 AM, Paul Lam <paullin3...@gmail.com> wrote:

> Hi,
>
> AFAIK, the difference between a BroadcastStream and a normal DataStream is
> that the BroadcastStream is with a BroadcastState, but it seems that the
> functionality of BroadcastState can also be achieved by MapState in a
> CoMapFunction or something since the control stream is still broadcasted
> without being turned into BroadcastStream. So, I’m wondering what’s the
> advantage of using BroadcastState? Thanks a lot!
>
> Best Regards,
> Paul Lam
>

Reply via email to