Re: Expressing `grep` with many search terms in Flink

Stephan Ewen Thu, 05 Feb 2015 01:36:05 -0800

Concerning your question how to run the programs one after another:

In the core method of the program, you can simply have a loop around the
part between "getExecutionEnvironment()" and "env.execute()". That way, you
trigger the programs one after another.




On Wed, Feb 4, 2015 at 9:34 PM, Fabian Hueske <fhue...@gmail.com> wrote:

> Hi Stefan,
>
> Flink uses only one broadcast variable for all parallel tasks on one
> machine.
> Flink can also load the broadcast variable into a custom data structure.
>
> Have a look at the getBroadcastVariableWithInitializer() method:
>
> /**
>  * Returns the result bound to the broadcast variable identified by the
>  * given {@code name}. The broadcast variable is returned as a shared data
> structure
>  * that is initialized with the given {@link BroadcastVariableInitializer}.
>  * <p>
>  * IMPORTANT: The broadcast variable data structure is shared between the
> parallel
>  *            tasks on one machine. Any access that modifies its internal
> state needs to
>  *            be manually synchronized by the caller.
>  *
>  * @param name The name under which the broadcast variable is registered;
>  * @param initializer The initializer that creates the shared data
> structure of the broadcast
>  *                    variable from the sequence of elements.
>  * @return The broadcast variable, materialized as a list of elements.
>  */
> <T, C> C getBroadcastVariableWithInitializer(String name,
> BroadcastVariableInitializer<T, C> initializer);
>
> Right now, there is no easy way to run multiple tasks one after the other
> that I am aware of.
> However, we are working on materializing intermediate results. Once this
> feature is available, it should be easy to do the grep steps one by one.
>
> Cheers, Fabian
> 
>

Re: Expressing `grep` with many search terms in Flink

Reply via email to