Hi Stefan, Flink uses only one broadcast variable for all parallel tasks on one machine. Flink can also load the broadcast variable into a custom data structure.
Have a look at the getBroadcastVariableWithInitializer() method: /** * Returns the result bound to the broadcast variable identified by the * given {@code name}. The broadcast variable is returned as a shared data structure * that is initialized with the given {@link BroadcastVariableInitializer}. * <p> * IMPORTANT: The broadcast variable data structure is shared between the parallel * tasks on one machine. Any access that modifies its internal state needs to * be manually synchronized by the caller. * * @param name The name under which the broadcast variable is registered; * @param initializer The initializer that creates the shared data structure of the broadcast * variable from the sequence of elements. * @return The broadcast variable, materialized as a list of elements. */ <T, C> C getBroadcastVariableWithInitializer(String name, BroadcastVariableInitializer<T, C> initializer); Right now, there is no easy way to run multiple tasks one after the other that I am aware of. However, we are working on materializing intermediate results. Once this feature is available, it should be easy to do the grep steps one by one. Cheers, Fabian