Re: Expressing `grep` with many search terms in Flink

2015-02-05 Thread Stephan Ewen
Concerning your question how to run the programs one after another: In the core method of the program, you can simply have a loop around the part between "getExecutionEnvironment()" and "env.execute()". That way, you trigger the programs one after another. On Wed, Feb 4, 2015 at 9:34 PM, Fabian

Re: Expressing `grep` with many search terms in Flink

2015-02-04 Thread Fabian Hueske
Hi Stefan, Flink uses only one broadcast variable for all parallel tasks on one machine. Flink can also load the broadcast variable into a custom data structure. Have a look at the getBroadcastVariableWithInitializer() method: /** * Returns the result bound to the broadcast variable identified

Expressing `grep` with many search terms in Flink

2015-02-04 Thread Stefan Bunk
Hi Squirrels, I have some trouble expressing my use case in Flink terms, so I am asking for your help: I have five million documents and fourteen million search terms. For each search term I want to know, in how many documents it occurs. So basically a `grep` with very many search terms. My curre