Hi, actually, I am distributing my data before the program starts, without using broadcast sets.
However, the approach should still work, under one condition: > DataSet mapped1 = > data.flatMap(yourMap).withBroadcastSet(smallData1,"data").setParallelism(5); > DataSet mapped2 = > data.flatMap(yourMap).withBroadcastSet(smallData2,"data").setParallelism(5); > Is it guaranteed, that this selects a disjoint set of nodes, i.e. five nodes for mapped1 and five other nodes for mapped2? Is there any way of selecting the five nodes concretely? Currently, I have stored the first half of the data on nodes 1-5 and the second half on nodes 6-10. With this approach, I guess, nodes are selected randomly so I would have to copy both halves to all of the nodes. Best, Stefan