Hi, I am starting with Flink. I have tried to look for the documentation but I havent found it clear.
I wonder the difference between these two states: FlatMap RUNNING vs DataSink RUNNIG. FlatMap is doing data any data transformation? Compilation? In which point is actually executing the function provided in the MapFunction? How could I know exactly the time for the kernel computation? It seems is using one thread in this step, even though I specified 16 threads in the createLocalEnvironment. CHAIN DataSource (at applyFunction(ApplyFunction.java:96) (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map (Map at applyFunction(ApplyFunction.java:108)) -> FlatMap (collect())(1/1) switched to RUNNING Here is running only one thread for almost 35 seconds. The rest of the execution is very fast (less than one second for computing the square of an array of 500000 integer elements) Thanks Juan Here the full log. 06/29/2015 14:13:25 Job execution switched to status RUNNING. 06/29/2015 14:13:25 CHAIN DataSource (at applyFunction(ApplyFunction.java:96) (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map (Map at applyFunction(ApplyFunction.java:108)) -> FlatMap (collect())(1/1) switched to SCHEDULED 06/29/2015 14:13:25 CHAIN DataSource (at applyFunction(ApplyFunction.java:96) (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map (Map at applyFunction(ApplyFunction.java:108)) -> FlatMap (collect())(1/1) switched to DEPLOYING 06/29/2015 14:13:26 CHAIN DataSource (at applyFunction(ApplyFunction.java:96) (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map (Map at applyFunction(ApplyFunction.java:108)) -> FlatMap (collect())(1/1) switched to RUNNING 06/29/2015 14:14:01 DataSink (collect() sink)(1/1) switched to SCHEDULED 06/29/2015 14:14:01 CHAIN DataSource (at applyFunction(ApplyFunction.java:96) (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map (Map at applyFunction(ApplyFunction.java:108)) -> FlatMap (collect())(1/1) switched to FINISHED 06/29/2015 14:14:01 DataSink (collect() sink)(1/1) switched to DEPLOYING 06/29/2015 14:14:01 DataSink (collect() sink)(1/1) switched to RUNNING 06/29/2015 14:14:01 DataSink (collect() sink)(1/1) switched to FINISHED 06/29/2015 14:14:01 Job execution switched to status FINISHED.