Hi Rong, Hequn Your answers are very helpful! Thank you!
Best Regards, Paul Lam > 在 2018年8月19日,23:30,Rong Rong <walter...@gmail.com> 写道: > > Hi Paul, > > To add to Hequn's answer. Broadcast state can typically be used as "a > low-throughput stream containing a set of rules which we want to evaluate > against all elements coming from another stream" [1] > So to add to the difference list is: whether it is "broadcast" across all > keys if processing a keyed stream. This is typically when it is not possible > to derive same key field using KeySelector in CoStream. > Another additional difference is performance: BroadcastStream is "stored > locally and is used to process all incoming elements on the other stream" > thus requires to carefully manage the size of the BroadcastStream. > > [1]: > https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/broadcast_state.html > > <https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/broadcast_state.html> > On Sun, Aug 19, 2018 at 1:40 AM Hequn Cheng <chenghe...@gmail.com > <mailto:chenghe...@gmail.com>> wrote: > Hi Paul, > > There are some differences: > 1. The BroadcastStream can broadcast data for you, i.e, data will be > broadcasted to all downstream tasks automatically. > 2. To guarantee that the contents in the Broadcast State are the same across > all parallel instances of our operator, read-write access is only given to > the broadcast side > 3. For BroadcastState, flink guarantees that upon restoring/rescaling there > will be no duplicates and no missing data. In case of recovery with the same > or smaller parallelism, each task reads its checkpointed state. Upon scaling > up, each task reads its own state, and the remaining tasks (p_new-p_old) read > checkpoints of previous tasks in a round-robin manner. While MapState doesn't > have such abilities. > > Best, Hequn > > On Sun, Aug 19, 2018 at 11:18 AM, Paul Lam <paullin3...@gmail.com > <mailto:paullin3...@gmail.com>> wrote: > Hi, > > AFAIK, the difference between a BroadcastStream and a normal DataStream is > that the BroadcastStream is with a BroadcastState, but it seems that the > functionality of BroadcastState can also be achieved by MapState in a > CoMapFunction or something since the control stream is still broadcasted > without being turned into BroadcastStream. So, I’m wondering what’s the > advantage of using BroadcastState? Thanks a lot! > > Best Regards, > Paul Lam >