One thing I forgot the add. I also have a Storm-WordCount job (build via FlinkTopologyBuilder) that uses the same "buffer-file-and-emit-over-and-over-again-pattern" in a spout. This job run just fine and stops regularly after 5 minutes.
-Matthias
On 10/14/2015 10:42 PM, Matthias J. Sax wrote:
> No. See log below.
>
> Btw: the job is not cleaned up properly. Some task remain in state
> "Canceling".
>
> The program I execute is "Streaming WordCount" example with my own
> source function. This custom source (see below), reads a local (small)
> file, bufferes each line in an internal buffer, and emits the buffered
> lines over and over again (for 5 minutes). (A copy of the file is
> located on every worker machine of course.) This is of course no common
> patttern; I just do this, because I have only a small file and want the
> job to run longer.
>
> As the job is running for multiple minutes until it fails, I assume that
> this uncommon pattern should actually not cauase any problem. I just
> report it, to make sure it is not the problem (hope you can confirm).
>
> -Matthias
>
>
>> private static final class SimpleFileSource implements
>> ParallelSourceFunction<String> {
>> private static final long serialVersionUID =
>> -5727198990308179551L;
>>
>> volatile boolean running = true;
>> @Override
>> public void run(
>>
>> org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext<String>
>> ctx)
>> throws Exception {
>>
>> BufferedReader reader = new BufferedReader(new
>> FileReader(
>> "/data/mjsax/hamlet.txt"));
>> ArrayList<String> buffer = new ArrayList<String>();
>> int counter = 0;
>> String line;
>>
>> long start = System.nanoTime();
>> while (running && (line = reader.readLine()) != null) {
>> buffer.add(line);
>> ctx.collect(line);
>> }
>> reader.close();
>>
>> while (running && (System.nanoTime() - start) < 5L * 60
>> * 1000 * 1000 * 1000) {
>> ctx.collect(buffer.get(counter));
>> counter = ++counter % buffer.size();
>> }
>> }
>>
>> @Override
>> public void cancel() {
>> running = false;
>> }
>> }
>
>
> JobManager log:
>
>> 22:19:16,618 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (2/28)
>> (29bb077aa0772c0c42a21b782d496b7c) switched from DEPLOYING to RUNNING
>> 22:22:52,082 WARN akka.remote.RemoteWatcher
>> - Detected unreachable: [akka.tcp://[email protected]:32994]
>> 22:22:52,083 INFO org.apache.flink.runtime.jobmanager.JobManager
>> - Task manager akka.tcp://[email protected]:32994/user/taskmanager
>> terminated.
>> 22:22:52,084 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (8/28)
>> (b3fb63bdac8f446e8bbaa015f30abfdc) switched from RUNNING to FAILED
>> 22:22:52,093 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (1/28)
>> (e18d7fb12ea2b5b88dda07e79a6c62ba) switched from RUNNING to CANCELING
>> 22:22:52,096 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (2/28)
>> (b5af8ba1031971c07b8655bdf62acdea) switched from RUNNING to CANCELING
>> 22:22:52,096 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (3/28)
>> (47d0cbfb8472a50ac005305dfab91202) switched from RUNNING to CANCELING
>> 22:22:52,096 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (4/28)
>> (39144b5a2aa58be4667bdc38c50602b5) switched from RUNNING to CANCELING
>> 22:22:52,096 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (5/28)
>> (c6dcb3a5d755b162dbe13f32de406a1d) switched from RUNNING to CANCELING
>> 22:22:52,097 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (6/28)
>> (09d9649f2e9c83dbbe3d1b253651f136) switched from RUNNING to CANCELING
>> 22:22:52,097 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (7/28)
>> (f82fda5a32c0f96acda4185ade9e985f) switched from RUNNING to CANCELING
>> 22:22:52,097 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (9/28)
>> (dc9d2905ae42980a25c8870a1d89c056) switched from RUNNING to CANCELING
>> 22:22:52,097 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (10/28)
>> (2e9682428b32fca6fca787e1a6b71359) switched from RUNNING to CANCELING
>> 22:22:52,097 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (11/28)
>> (bae79dd7f2b9346c7d168db8143c0bd4) switched from RUNNING to CANCELING
>> 22:22:52,098 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (12/28)
>> (642c76fc78faafcb023bce9cec041e9d) switched from RUNNING to CANCELING
>> 22:22:52,098 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (13/28)
>> (1cf21c3abc8f91f82b5fb6175181baeb) switched from RUNNING to CANCELING
>> 22:22:52,098 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (14/28)
>> (97fe45f48434c568652a9f9cd885b2da) switched from RUNNING to CANCELING
>> 22:22:52,098 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (15/28)
>> (b8833cdda3fc205865ef611d176b33ea) switched from RUNNING to CANCELING
>> 22:22:52,098 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (16/28)
>> (157a107c35ac994503cca577389a4f80) switched from RUNNING to CANCELING
>> 22:22:52,099 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (17/28)
>> (509d1dacb4448b741aaf170009a31364) switched from RUNNING to CANCELING
>> 22:22:52,099 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (18/28)
>> (d6bce51dd6d00b6b2f7e59ffcc733242) switched from RUNNING to CANCELING
>> 22:22:52,099 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (19/28)
>> (dc6931d39b054b761cfed7345d683c29) switched from RUNNING to CANCELING
>> 22:22:52,099 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (20/28)
>> (0f288fa6cc5910235e32564c1740670f) switched from RUNNING to CANCELING
>> 22:22:52,099 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (21/28)
>> (80b811c2787d34c67941686e3c2be74a) switched from RUNNING to CANCELING
>> 22:22:52,100 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (22/28)
>> (c70d69252e7b3ff6a537a2b09180d818) switched from RUNNING to CANCELING
>> 22:22:52,100 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (23/28)
>> (29154b2299dd080e041ca8a349d3b418) switched from RUNNING to CANCELING
>> 22:22:52,100 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (24/28)
>> (f56827e4079567adb23406c44f765dcf) switched from RUNNING to CANCELING
>> 22:22:52,100 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (25/28)
>> (cd6f30e3a69b02367a6ddd097f9fb825) switched from RUNNING to CANCELING
>> 22:22:52,100 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (26/28)
>> (c4b3af71e26718513752595a6cbdcee5) switched from RUNNING to CANCELING
>> 22:22:52,101 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (27/28)
>> (b152ab3209ca6c68c6f20213ccb48f40) switched from RUNNING to CANCELING
>> 22:22:52,101 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (28/28)
>> (efcee09b90ca372bffa4d086ce0b104a) switched from RUNNING to CANCELING
>> 22:22:52,101 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (1/28)
>> (b2189b3d40cb4fa24083ffefd43ffc4f) switched from RUNNING to CANCELING
>> 22:22:52,101 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (2/28)
>> (29bb077aa0772c0c42a21b782d496b7c) switched from RUNNING to CANCELING
>> 22:22:52,101 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (3/28)
>> (2eb046c3d88cb7e6deccb5ac9d3a3704) switched from RUNNING to CANCELING
>> 22:22:52,102 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (4/28)
>> (63f6696e53f08dc420bb51a10a092531) switched from RUNNING to CANCELING
>> 22:22:52,102 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (5/28)
>> (bd9c6fc6d89e64dd3bef9c21065dfc87) switched from RUNNING to CANCELING
>> 22:22:52,102 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (6/28)
>> (b675c869f60552cc04c3842d297a2529) switched from RUNNING to CANCELING
>> 22:22:52,102 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (7/28)
>> (3f56dfd2ff3fe595555be6a372bf7c7b) switched from RUNNING to CANCELING
>> 22:22:52,102 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (8/28)
>> (94d29535cf2559e5fe6dc13b8a5e3f08) switched from RUNNING to CANCELING
>> 22:22:52,103 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (9/28)
>> (5f99a88b2041719d202f20bbc5dd2ca6) switched from RUNNING to CANCELING
>> 22:22:52,103 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (10/28)
>> (7c1a75ec9794ae9197ac3e64ecbd4ad8) switched from RUNNING to CANCELING
>> 22:22:52,103 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (11/28)
>> (c110922406d6f58cd34c80abf1ace972) switched from RUNNING to CANCELING
>> 22:22:52,103 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (12/28)
>> (2f991c806e12dd9715da532f35d3cd83) switched from RUNNING to CANCELING
>> 22:22:52,103 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (13/28)
>> (4b59d9e7f5102f7a5d1c09e49ffca66e) switched from RUNNING to CANCELING
>> 22:22:52,104 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (14/28)
>> (6b966d3247685983a73b439666fe315a) switched from RUNNING to CANCELING
>> 22:22:52,104 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (15/28)
>> (b7f040b44a87db85c41b7f8e1808aecc) switched from RUNNING to CANCELING
>> 22:22:52,104 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (16/28)
>> (1a42f51a278bef23c497da8d34a28e91) switched from RUNNING to CANCELING
>> 22:22:52,104 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (17/28)
>> (b3d86978bc1bf04e2cab2700fb6975e0) switched from RUNNING to CANCELING
>> 22:22:52,104 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (18/28)
>> (a44cb0a419524a0b7e04d476285b03c0) switched from RUNNING to CANCELING
>> 22:22:52,104 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (19/28)
>> (0d2b87be2ec0eb7961c05a4cf7baf7a7) switched from RUNNING to CANCELING
>> 22:22:52,105 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (20/28)
>> (ebd6220fc5d98912811c19aa9fe29e63) switched from RUNNING to CANCELING
>> 22:22:52,105 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (21/28)
>> (6cd95e1bb7c22e310967eab4149fa93d) switched from RUNNING to CANCELING
>> 22:22:52,105 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (22/28)
>> (ddc9b7f1c122e2d36632c8f8e886d35b) switched from RUNNING to CANCELING
>> 22:22:52,105 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (23/28)
>> (0e3b8cc17d0169a12142532dd800f966) switched from RUNNING to CANCELING
>> 22:22:52,105 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (24/28)
>> (4ac383b01d8c5a2d395612202d1f876c) switched from RUNNING to CANCELING
>> 22:22:52,106 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (25/28)
>> (bf542fee30071b7cfa63bae514af6106) switched from RUNNING to CANCELING
>> 22:22:52,106 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (26/28)
>> (e60b40069c1c82c44318e053eacfa7bd) switched from RUNNING to CANCELING
>> 22:22:52,106 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (27/28)
>> (35d1ce78a6f61a75af41c7b6d9c403a9) switched from RUNNING to CANCELING
>> 22:22:52,106 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (28/28)
>> (de58a786dbd2557dcc7c54bf2710525b) switched from RUNNING to CANCELING
>> 22:22:52,107 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (16/28)
>> (1a42f51a278bef23c497da8d34a28e91) switched from CANCELING to FAILED
>> 22:22:52,107 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (28/28)
>> (efcee09b90ca372bffa4d086ce0b104a) switched from CANCELING to FAILED
>> 22:22:52,108 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (15/28)
>> (b7f040b44a87db85c41b7f8e1808aecc) switched from CANCELING to FAILED
>> 22:22:52,108 INFO org.apache.flink.runtime.instance.InstanceManager
>> - Unregistered task manager
>> akka.tcp://[email protected]:32994/user/taskmanager. Number of registered
>> task managers 19. Number of available slots 228.
>> 22:22:52,109 INFO org.apache.flink.runtime.jobmanager.JobManager
>> - Status of job 11b9636948e49d96965051902b0aee1f (Streaming WordCount)
>> changed to FAILING.
>> java.lang.Exception: The slot in which the task was executed has been
>> released. Probably loss of TaskManager 8175fe04b5cad804b197fbd1f8e2a599 @
>> dbis32 - 12 slots - URL:
>> akka.tcp://[email protected]:32994/user/taskmanager
>> at
>> org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:151)
>> at
>> org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
>> at
>> org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
>> at
>> org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156)
>> at
>> org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215)
>> at
>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:565)
>> at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>> at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>> at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>> at
>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>> at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>> at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>> at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>> at
>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>> at
>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>> at
>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>> at
>> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:103)
>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>> at
>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>> at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>> at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>> at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>> at
>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>> 22:22:52,132 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (15/28)
>> (b8833cdda3fc205865ef611d176b33ea) switched from CANCELING to CANCELED
>> 22:22:52,132 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (11/28)
>> (bae79dd7f2b9346c7d168db8143c0bd4) switched from CANCELING to CANCELED
>> 22:22:52,133 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (9/28)
>> (dc9d2905ae42980a25c8870a1d89c056) switched from CANCELING to CANCELED
>> 22:22:52,133 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (18/28)
>> (d6bce51dd6d00b6b2f7e59ffcc733242) switched from CANCELING to CANCELED
>> 22:22:52,134 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (6/28)
>> (09d9649f2e9c83dbbe3d1b253651f136) switched from CANCELING to CANCELED
>> 22:22:52,135 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (17/28)
>> (509d1dacb4448b741aaf170009a31364) switched from CANCELING to CANCELED
>> 22:22:52,135 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (26/28)
>> (c4b3af71e26718513752595a6cbdcee5) switched from CANCELING to CANCELED
>> 22:22:52,135 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (5/28)
>> (c6dcb3a5d755b162dbe13f32de406a1d) switched from CANCELING to CANCELED
>> 22:22:52,136 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (14/28)
>> (97fe45f48434c568652a9f9cd885b2da) switched from CANCELING to CANCELED
>> 22:22:52,136 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (25/28)
>> (cd6f30e3a69b02367a6ddd097f9fb825) switched from CANCELING to CANCELED
>> 22:22:52,137 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (13/28)
>> (1cf21c3abc8f91f82b5fb6175181baeb) switched from CANCELING to CANCELED
>> 22:22:52,137 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (10/28)
>> (2e9682428b32fca6fca787e1a6b71359) switched from CANCELING to CANCELED
>> 22:22:52,137 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (19/28)
>> (dc6931d39b054b761cfed7345d683c29) switched from CANCELING to CANCELED
>> 22:22:52,139 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (7/28)
>> (f82fda5a32c0f96acda4185ade9e985f) switched from CANCELING to CANCELED
>> 22:22:52,139 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (4/28)
>> (39144b5a2aa58be4667bdc38c50602b5) switched from CANCELING to CANCELED
>> 22:22:52,139 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (2/28)
>> (b5af8ba1031971c07b8655bdf62acdea) switched from CANCELING to CANCELED
>> 22:22:52,139 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (24/28)
>> (f56827e4079567adb23406c44f765dcf) switched from CANCELING to CANCELED
>> 22:22:52,140 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (22/28)
>> (c70d69252e7b3ff6a537a2b09180d818) switched from CANCELING to CANCELED
>> 22:22:52,140 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (3/28)
>> (47d0cbfb8472a50ac005305dfab91202) switched from CANCELING to CANCELED
>> 22:22:52,140 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (27/28)
>> (b152ab3209ca6c68c6f20213ccb48f40) switched from CANCELING to CANCELED
>> 22:22:52,141 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (23/28)
>> (29154b2299dd080e041ca8a349d3b418) switched from CANCELING to CANCELED
>> 22:22:52,141 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (20/28)
>> (0f288fa6cc5910235e32564c1740670f) switched from CANCELING to CANCELED
>> 22:22:52,142 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (12/28)
>> (642c76fc78faafcb023bce9cec041e9d) switched from CANCELING to CANCELED
>> 22:22:52,142 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (16/28)
>> (157a107c35ac994503cca577389a4f80) switched from CANCELING to CANCELED
>> 22:22:57,003 INFO Remoting
>> - Quarantined address [akka.tcp://[email protected]:32994] is still
>> unreachable or has not been restarted. Keeping it quarantined.
>> 22:23:01,078 WARN akka.remote.RemoteWatcher
>> - Detected unreachable: [akka.tcp://[email protected]:35242]
>> 22:23:01,079 INFO org.apache.flink.runtime.jobmanager.JobManager
>> - Task manager akka.tcp://[email protected]:35242/user/taskmanager
>> terminated.
>> 22:23:01,079 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (1/28)
>> (b2189b3d40cb4fa24083ffefd43ffc4f) switched from CANCELING to FAILED
>> 22:23:01,081 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (21/28)
>> (80b811c2787d34c67941686e3c2be74a) switched from CANCELING to FAILED
>> 22:23:01,082 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Keyed Aggregation -> Sink: Unnamed (2/28)
>> (29bb077aa0772c0c42a21b782d496b7c) switched from CANCELING to FAILED
>> 22:23:01,083 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
>> - Source: Custom Source -> Flat Map (1/28)
>> (e18d7fb12ea2b5b88dda07e79a6c62ba) switched from CANCELING to FAILED
>> 22:23:01,084 INFO org.apache.flink.runtime.instance.InstanceManager
>> - Unregistered task manager
>> akka.tcp://[email protected]:35242/user/taskmanager. Number of registered
>> task managers 18. Number of available slots 216.
>
>
>
>
> On 10/11/2015 11:54 PM, Stephan Ewen wrote:
>> Can you see is there is anything unusual in the JobManager logs?
>> Am 11.10.2015 18:56 schrieb "Matthias J. Sax" <[email protected]>:
>>
>>> Hi,
>>>
>>> I was just playing arround with Flink. After submitting my job, it runs
>>> for multiple minutes, until I get the following Exception in one if the
>>> TaskManager logs and the job fails.
>>>
>>> I have no clue what's going on...
>>>
>>> -Matthias
>>>
>>>
>>>> 18:43:23,567 WARN akka.remote.RemoteWatcher
>>> - Detected unreachable: [akka.tcp://[email protected]:6123]
>>>> 18:43:23,864 WARN akka.remote.ReliableDeliverySupervisor
>>> - Association with remote system [akka.tcp://
>>> [email protected]:6123] has failed, address is now gated for [5000]
>>> ms. Reason is: [Disassociated].
>>>> 18:43:23,866 INFO org.apache.flink.runtime.taskmanager.TaskManager
>>> - TaskManager akka://flink/user/taskmanager disconnects from
>>> JobManager akka.tcp://[email protected]:6123/user/jobmanager:
>>> JobManager is no longer reachable
>>>> 18:43:23,867 INFO org.apache.flink.runtime.taskmanager.TaskManager
>>> - Cancelling all computations and discarding all cached data.
>>>> 18:43:23,870 INFO org.apache.flink.runtime.taskmanager.Task
>>> - Attempting to fail task externally Keyed Aggregation -> Sink:
>>> Unnamed (1/28)
>>>> 18:43:23,870 INFO org.apache.flink.runtime.taskmanager.Task
>>> - Keyed Aggregation -> Sink: Unnamed (1/28) switched to FAILED
>>> with exception.
>>>> java.lang.Exception: TaskManager akka://flink/user/taskmanager
>>> disconnects from JobManager akka.tcp://
>>> [email protected]:6123/user/jobmanager: JobManager is no longer
>>> reachable
>>>> at
>>> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>>>> at
>>> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>>>> at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>>> at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>>> at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>>> at
>>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>>> at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>>> at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>>> at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>>> at
>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>>> at
>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>>> at
>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>>>> at
>>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>> at
>>> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>> at
>>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>>> at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>>> at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>> at
>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>> at
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>> at
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>> at
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>> 18:43:23,879 INFO org.apache.flink.runtime.taskmanager.Task
>>> - Triggering cancellation of task code Keyed Aggregation -> Sink:
>>> Unnamed (1/28) (4b1f979a60c30f84ee60553730a4e99a).
>>>> 18:43:23,880 INFO org.apache.flink.runtime.taskmanager.Task
>>> - Attempting to fail task externally Source: Custom Source -> Flat
>>> Map (1/28)
>>>> 18:43:23,880 INFO org.apache.flink.runtime.taskmanager.Task
>>> - Source: Custom Source -> Flat Map (1/28) switched to FAILED with
>>> exception.
>>>> java.lang.Exception: TaskManager akka://flink/user/taskmanager
>>> disconnects from JobManager akka.tcp://
>>> [email protected]:6123/user/jobmanager: JobManager is no longer
>>> reachable
>>>> at
>>> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>>>> at
>>> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>>>> at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>>> at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>>> at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>>> at
>>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>>> at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>>> at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>>> at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>>> at
>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>>> at
>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>>> at
>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>>>> at
>>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>> at
>>> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>> at
>>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>>> at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>>> at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>> at
>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>> at
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>> at
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>> at
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>
>>>
>>
>
signature.asc
Description: OpenPGP digital signature
