Hi, I've recently published a blog post about Broadcast State [1].
Cheers, Fabian [1] https://data-artisans.com/blog/a-practical-guide-to-broadcast-state-in-apache-flink 2018-08-20 3:58 GMT+02:00 Paul Lam <paullin3...@gmail.com>: > Hi Rong, Hequn > > Your answers are very helpful! Thank you! > > Best Regards, > Paul Lam > > 在 2018年8月19日,23:30,Rong Rong <walter...@gmail.com> 写道: > > Hi Paul, > > To add to Hequn's answer. Broadcast state can typically be used as "a > low-throughput stream containing a set of rules which we want to evaluate > against all elements coming from another stream" [1] > So to add to the difference list is: whether it is "broadcast" across all > keys if processing a keyed stream. This is typically when it is not > possible to derive same key field using KeySelector in CoStream. > Another additional difference is performance: BroadcastStream is "stored > locally and is used to process all incoming elements on the other stream" > thus requires to carefully manage the size of the BroadcastStream. > > [1]: https://ci.apache.org/projects/flink/flink-docs- > release-1.6/dev/stream/state/broadcast_state.html > > On Sun, Aug 19, 2018 at 1:40 AM Hequn Cheng <chenghe...@gmail.com> wrote: > >> Hi Paul, >> >> There are some differences: >> 1. The BroadcastStream can broadcast data for you, i.e, data will be >> broadcasted to all downstream tasks automatically. >> 2. To guarantee that the contents in the Broadcast State are the same >> across all parallel instances of our operator, read-write access is only >> given to the broadcast side >> 3. For BroadcastState, flink guarantees that upon restoring/rescaling >> there will be no duplicates and no missing data. In case of recovery with >> the same or smaller parallelism, each task reads its checkpointed state. >> Upon scaling up, each task reads its own state, and the remaining tasks >> (p_new-p_old) read checkpoints of previous tasks in a round-robin manner. >> While MapState doesn't have such abilities. >> >> Best, Hequn >> >> On Sun, Aug 19, 2018 at 11:18 AM, Paul Lam <paullin3...@gmail.com> wrote: >> >>> Hi, >>> >>> AFAIK, the difference between a BroadcastStream and a normal DataStream >>> is that the BroadcastStream is with a BroadcastState, but it seems that the >>> functionality of BroadcastState can also be achieved by MapState in a >>> CoMapFunction or something since the control stream is still broadcasted >>> without being turned into BroadcastStream. So, I’m wondering what’s the >>> advantage of using BroadcastState? Thanks a lot! >>> >>> Best Regards, >>> Paul Lam >>> >> >> >