Hi Paul, There are some differences: 1. The BroadcastStream can broadcast data for you, i.e, data will be broadcasted to all downstream tasks automatically. 2. To guarantee that the contents in the Broadcast State are the same across all parallel instances of our operator, read-write access is only given to the broadcast side 3. For BroadcastState, flink guarantees that upon restoring/rescaling there will be no duplicates and no missing data. In case of recovery with the same or smaller parallelism, each task reads its checkpointed state. Upon scaling up, each task reads its own state, and the remaining tasks (p_new-p_old) read checkpoints of previous tasks in a round-robin manner. While MapState doesn't have such abilities.
Best, Hequn On Sun, Aug 19, 2018 at 11:18 AM, Paul Lam <paullin3...@gmail.com> wrote: > Hi, > > AFAIK, the difference between a BroadcastStream and a normal DataStream is > that the BroadcastStream is with a BroadcastState, but it seems that the > functionality of BroadcastState can also be achieved by MapState in a > CoMapFunction or something since the control stream is still broadcasted > without being turned into BroadcastStream. So, I’m wondering what’s the > advantage of using BroadcastState? Thanks a lot! > > Best Regards, > Paul Lam >