Is there any other way to apply the function in parallel and return the result to the client in parallel?
Thanks Juan On Mon, 2015-06-29 at 15:01 +0200, Stephan Ewen wrote: > In general, avoid collect if you can. Collect brings data top the > client, where the computation is not parallel any more. > > > Try to do as much on the DataSet as possible. > > On Mon, Jun 29, 2015 at 2:58 PM, Juan Fumero > <juan.jose.fumero.alfo...@oracle.com> wrote: > Hi Stephan, > so should I use another method instead of collect? It seems > multithread is not working with this. > > > Juan > > On Mon, 2015-06-29 at 14:51 +0200, Stephan Ewen wrote: > > Hi Juan! > > > > > > This is an artifact of a workaround right now. The actual > collect() > > logic happens in the flatMap() and the sink is a dummy that > executes > > nothing. The flatMap writes the data to be collected to the > > "accumulator" that delivers it back. > > > > > > Greetings, > > Stephan > > > > > > > > On Mon, Jun 29, 2015 at 2:30 PM, Juan Fumero > > <juan.jose.fumero.alfo...@oracle.com> wrote: > > Hi, > > I am starting with Flink. I have tried to look for > the > > documentation but I havent found it clear. > > > > I wonder the difference between these two states: > > > > FlatMap RUNNING vs DataSink RUNNIG. > > > > FlatMap is doing data any data transformation? > Compilation? In > > which point is actually executing the function > provided in the > > MapFunction? How could I know exactly the time for > the kernel > > computation? > > > > It seems is using one thread in this step, even > though I > > specified 16 threads in the createLocalEnvironment. > > > > CHAIN DataSource (at > applyFunction(ApplyFunction.java:96) > > > (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map > > (Map at applyFunction(ApplyFunction.java:108)) -> > FlatMap > > (collect())(1/1) switched to RUNNING > > > > Here is running only one thread for almost 35 > seconds. > > > > The rest of the execution is very fast (less than > one second > > for computing the square of an array of 500000 > integer > > elements) > > > > Thanks > > Juan > > > > Here the full log. > > > > 06/29/2015 14:13:25 Job execution switched to status > RUNNING. > > 06/29/2015 14:13:25 CHAIN DataSource (at > > applyFunction(ApplyFunction.java:96) > > > (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map > > (Map at applyFunction(ApplyFunction.java:108)) -> > FlatMap > > (collect())(1/1) switched to SCHEDULED > > 06/29/2015 14:13:25 CHAIN DataSource (at > > applyFunction(ApplyFunction.java:96) > > > (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map > > (Map at applyFunction(ApplyFunction.java:108)) -> > FlatMap > > (collect())(1/1) switched to DEPLOYING > > 06/29/2015 14:13:26 CHAIN DataSource (at > > applyFunction(ApplyFunction.java:96) > > > (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map > > (Map at applyFunction(ApplyFunction.java:108)) -> > FlatMap > > (collect())(1/1) switched to RUNNING > > 06/29/2015 14:14:01 DataSink (collect() sink)(1/1) > switched to > > SCHEDULED > > 06/29/2015 14:14:01 CHAIN DataSource (at > > applyFunction(ApplyFunction.java:96) > > > (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map > > (Map at applyFunction(ApplyFunction.java:108)) -> > FlatMap > > (collect())(1/1) switched to FINISHED > > 06/29/2015 14:14:01 DataSink (collect() sink)(1/1) > switched to > > DEPLOYING > > 06/29/2015 14:14:01 DataSink (collect() sink)(1/1) > switched to > > RUNNING > > 06/29/2015 14:14:01 DataSink (collect() sink)(1/1) > switched to > > FINISHED > > 06/29/2015 14:14:01 Job execution switched to status > FINISHED. > > > > > > > > > > > > > > > > >