Hi Stefan,

Flink uses only one broadcast variable for all parallel tasks on one
machine.
Flink can also load the broadcast variable into a custom data structure.

Have a look at the getBroadcastVariableWithInitializer() method:

/**
 * Returns the result bound to the broadcast variable identified by the
 * given {@code name}. The broadcast variable is returned as a shared data
structure
 * that is initialized with the given {@link BroadcastVariableInitializer}.
 * <p>
 * IMPORTANT: The broadcast variable data structure is shared between the
parallel
 *            tasks on one machine. Any access that modifies its internal
state needs to
 *            be manually synchronized by the caller.
 *
 * @param name The name under which the broadcast variable is registered;
 * @param initializer The initializer that creates the shared data
structure of the broadcast
 *                    variable from the sequence of elements.
 * @return The broadcast variable, materialized as a list of elements.
 */
<T, C> C getBroadcastVariableWithInitializer(String name,
BroadcastVariableInitializer<T, C> initializer);

Right now, there is no easy way to run multiple tasks one after the other
that I am aware of.
However, we are working on materializing intermediate results. Once this
feature is available, it should be easy to do the grep steps one by one.

Cheers, Fabian
​

Reply via email to