Re: Apache Flink - broadcasting DataStream

M Singh Sat, 30 Dec 2017 14:33:20 -0800

Hi Ufuk:
Thanks for your explanation.
I can understand broadcasting a small immutable dataset to the subtasks so that 
they can be joined with a stream.  
However I am still trying to understand how will each broadcasted element from 
a stream be used in join operation with another stream.  Is this just on 
optimization over joining two streams ?  
Also, I believe that substasks are operating on partitions of a stream and only 
equi-joins are possible for streams.  So what is the reason we would like to 
broadcast each element to all the substasks ?
Thanks again.

    On Wednesday, December 27, 2017 12:52 AM, Ufuk Celebi <u...@apache.org> 
wrote:

 Hey Mans!

This refers to how sub tasks are connected to each other in your
program. If you have a single sub task A1 and three sub tasks B1, B2,
B3, broadcast will emit each incoming record at A1 to all B1, B2, B3:

A1 --+-> B1
    +-> B2
    +-> B3

Does this help?

On Mon, Dec 25, 2017 at 7:12 PM, M Singh <mans2si...@yahoo.com> wrote:
> 1 What elements get broad to the partitions ?

Each incoming element is broadcasted

> 2. What happens as new elements are added to the stream ? Are only the new
> elements broadcast ?

Yes, each incoming element is broadcasted separately without any history.

> 3. Since the broadcast operation returns a DataStream can it be used in join
> how do new (and old) elements affect the join results ?

Yes, should be. Every element is broadcasted only once.

> 4. Similarly how does broadcast work with connected streams ?

Similar to non connected streams. The incoming records are emitted to
every downstream partition.

– Ufuk

Re: Apache Flink - broadcasting DataStream

Reply via email to