It is not quite easy to understand what you are trying to do.

Can you post your program here? Then we can take a look and give you a good
answer...

On Mon, Jun 29, 2015 at 3:47 PM, Juan Fumero <
juan.jose.fumero.alfo...@oracle.com> wrote:

> Is there any other way to apply the function in parallel and return the
> result to the client in parallel?
>
> Thanks
> Juan
>
> On Mon, 2015-06-29 at 15:01 +0200, Stephan Ewen wrote:
> > In general, avoid collect if you can. Collect brings data top the
> > client, where the computation is not parallel any more.
> >
> >
> > Try to do as much on the DataSet as possible.
> >
> > On Mon, Jun 29, 2015 at 2:58 PM, Juan Fumero
> > <juan.jose.fumero.alfo...@oracle.com> wrote:
> >         Hi Stephan,
> >           so should I use another method instead of collect? It seems
> >         multithread is not working with this.
> >
> >
> >         Juan
> >
> >         On Mon, 2015-06-29 at 14:51 +0200, Stephan Ewen wrote:
> >         > Hi Juan!
> >         >
> >         >
> >         > This is an artifact of a workaround right now. The actual
> >         collect()
> >         > logic happens in the flatMap() and the sink is a dummy that
> >         executes
> >         > nothing. The flatMap writes the data to be collected to the
> >         > "accumulator" that delivers it back.
> >         >
> >         >
> >         > Greetings,
> >         > Stephan
> >         >
> >         >
> >         >
> >         > On Mon, Jun 29, 2015 at 2:30 PM, Juan Fumero
> >         > <juan.jose.fumero.alfo...@oracle.com> wrote:
> >         >         Hi,
> >         >           I am starting with Flink. I have tried to look for
> >         the
> >         >         documentation but I havent found it clear.
> >         >
> >         >         I wonder the difference between these two states:
> >         >
> >         >         FlatMap RUNNING vs DataSink RUNNIG.
> >         >
> >         >         FlatMap is doing data any data transformation?
> >         Compilation? In
> >         >         which point is actually executing the function
> >         provided in the
> >         >         MapFunction? How could I know exactly the time for
> >         the kernel
> >         >         computation?
> >         >
> >         >         It seems is using one thread in this step, even
> >         though I
> >         >         specified 16 threads in the createLocalEnvironment.
> >         >
> >         >         CHAIN DataSource (at
> >         applyFunction(ApplyFunction.java:96)
> >         >
> >          (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map
> >         >         (Map at applyFunction(ApplyFunction.java:108)) ->
> >         FlatMap
> >         >         (collect())(1/1) switched to RUNNING
> >         >
> >         >         Here is running only one thread for almost 35
> >         seconds.
> >         >
> >         >         The rest of the execution is very fast (less than
> >         one second
> >         >         for computing the square of an array of 500000
> >         integer
> >         >         elements)
> >         >
> >         >         Thanks
> >         >         Juan
> >         >
> >         >         Here the full log.
> >         >
> >         >         06/29/2015 14:13:25 Job execution switched to status
> >         RUNNING.
> >         >         06/29/2015 14:13:25 CHAIN DataSource (at
> >         >         applyFunction(ApplyFunction.java:96)
> >         >
> >          (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map
> >         >         (Map at applyFunction(ApplyFunction.java:108)) ->
> >         FlatMap
> >         >         (collect())(1/1) switched to SCHEDULED
> >         >         06/29/2015 14:13:25 CHAIN DataSource (at
> >         >         applyFunction(ApplyFunction.java:96)
> >         >
> >          (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map
> >         >         (Map at applyFunction(ApplyFunction.java:108)) ->
> >         FlatMap
> >         >         (collect())(1/1) switched to DEPLOYING
> >         >         06/29/2015 14:13:26 CHAIN DataSource (at
> >         >         applyFunction(ApplyFunction.java:96)
> >         >
> >          (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map
> >         >         (Map at applyFunction(ApplyFunction.java:108)) ->
> >         FlatMap
> >         >         (collect())(1/1) switched to RUNNING
> >         >         06/29/2015 14:14:01 DataSink (collect() sink)(1/1)
> >         switched to
> >         >         SCHEDULED
> >         >         06/29/2015 14:14:01 CHAIN DataSource (at
> >         >         applyFunction(ApplyFunction.java:96)
> >         >
> >          (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map
> >         >         (Map at applyFunction(ApplyFunction.java:108)) ->
> >         FlatMap
> >         >         (collect())(1/1) switched to FINISHED
> >         >         06/29/2015 14:14:01 DataSink (collect() sink)(1/1)
> >         switched to
> >         >         DEPLOYING
> >         >         06/29/2015 14:14:01 DataSink (collect() sink)(1/1)
> >         switched to
> >         >         RUNNING
> >         >         06/29/2015 14:14:01 DataSink (collect() sink)(1/1)
> >         switched to
> >         >         FINISHED
> >         >         06/29/2015 14:14:01 Job execution switched to status
> >         FINISHED.
> >         >
> >         >
> >         >
> >         >
> >         >
> >         >
> >
> >
> >
> >
> >
>
>
>

Reply via email to