Hi Ufuk: Thanks for your explanation. I can understand broadcasting a small immutable dataset to the subtasks so that they can be joined with a stream. However I am still trying to understand how will each broadcasted element from a stream be used in join operation with another stream. Is this just on optimization over joining two streams ? Also, I believe that substasks are operating on partitions of a stream and only equi-joins are possible for streams. So what is the reason we would like to broadcast each element to all the substasks ? Thanks again.
On Wednesday, December 27, 2017 12:52 AM, Ufuk Celebi <u...@apache.org> wrote: Hey Mans! This refers to how sub tasks are connected to each other in your program. If you have a single sub task A1 and three sub tasks B1, B2, B3, broadcast will emit each incoming record at A1 to all B1, B2, B3: A1 --+-> B1 +-> B2 +-> B3 Does this help? On Mon, Dec 25, 2017 at 7:12 PM, M Singh <mans2si...@yahoo.com> wrote: > 1 What elements get broad to the partitions ? Each incoming element is broadcasted > 2. What happens as new elements are added to the stream ? Are only the new > elements broadcast ? Yes, each incoming element is broadcasted separately without any history. > 3. Since the broadcast operation returns a DataStream can it be used in join > how do new (and old) elements affect the join results ? Yes, should be. Every element is broadcasted only once. > 4. Similarly how does broadcast work with connected streams ? Similar to non connected streams. The incoming records are emitted to every downstream partition. – Ufuk