Hi Rong, Hequn

Your answers are very helpful! Thank you!

Best Regards,
Paul Lam

> 在 2018年8月19日,23:30,Rong Rong <walter...@gmail.com> 写道:
> 
> Hi Paul,
> 
> To add to Hequn's answer. Broadcast state can typically be used as "a 
> low-throughput stream containing a set of rules which we want to evaluate 
> against all elements coming from another stream" [1] 
> So to add to the difference list is: whether it is "broadcast" across all 
> keys if processing a keyed stream. This is typically when it is not possible 
> to derive same key field using KeySelector in CoStream.
> Another additional difference is performance: BroadcastStream is "stored 
> locally and is used to process all incoming elements on the other stream" 
> thus requires to carefully manage the size of the BroadcastStream.
> 
> [1]: 
> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/broadcast_state.html
>  
> <https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/broadcast_state.html>
> On Sun, Aug 19, 2018 at 1:40 AM Hequn Cheng <chenghe...@gmail.com 
> <mailto:chenghe...@gmail.com>> wrote:
> Hi Paul,
> 
> There are some differences:
> 1. The BroadcastStream can broadcast data for you, i.e, data will be 
> broadcasted to all downstream tasks automatically. 
> 2. To guarantee that the contents in the Broadcast State are the same across 
> all parallel instances of our operator, read-write access is only given to 
> the broadcast side
> 3. For BroadcastState, flink guarantees that upon restoring/rescaling there 
> will be no duplicates and no missing data. In case of recovery with the same 
> or smaller parallelism, each task reads its checkpointed state. Upon scaling 
> up, each task reads its own state, and the remaining tasks (p_new-p_old) read 
> checkpoints of previous tasks in a round-robin manner. While MapState doesn't 
> have such abilities.
> 
> Best, Hequn
> 
> On Sun, Aug 19, 2018 at 11:18 AM, Paul Lam <paullin3...@gmail.com 
> <mailto:paullin3...@gmail.com>> wrote:
> Hi, 
> 
> AFAIK, the difference between a BroadcastStream and a normal DataStream is 
> that the BroadcastStream is with a BroadcastState, but it seems that the 
> functionality of BroadcastState can also be achieved by MapState in a 
> CoMapFunction or something since the control stream is still broadcasted 
> without being turned into BroadcastStream. So, I’m wondering what’s the 
> advantage of using BroadcastState? Thanks a lot!
> 
> Best Regards,
> Paul Lam
> 

Reply via email to