Robert Metzger created FLINK-4137: ------------------------------------- Summary: JobManager web frontend does not shut down on OOM exception on JM Key: FLINK-4137 URL: https://issues.apache.org/jira/browse/FLINK-4137 Project: Flink Issue Type: Bug Components: Distributed Coordination, JobManager, Webfrontend Reporter: Robert Metzger Priority: Critical
After the following Exception on the JobManager. {code} 2016-06-30 14:45:06,642 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed checkpoint 379 (in 7017 ms) 2016-06-30 14:45:06,642 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 380 @ 1467297906642 2016-06-30 14:45:17,902 ERROR akka.actor.ActorSystemImpl - Uncaught fatal error from thread [flink-akka.remote.default-remote-dispatcher-6] shutting down ActorSystem [flink] java.lang.OutOfMemoryError: Java heap space at com.google.protobuf.ByteString.copyFrom(ByteString.java:192) at com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:324) at akka.remote.WireFormats$SerializedMessage.<init>(WireFormats.java:3030) at akka.remote.WireFormats$SerializedMessage.<init>(WireFormats.java:2980) at akka.remote.WireFormats$SerializedMessage$1.parsePartialFrom(WireFormats.java:3073) at akka.remote.WireFormats$SerializedMessage$1.parsePartialFrom(WireFormats.java:3068) at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309) at akka.remote.WireFormats$RemoteEnvelope.<init>(WireFormats.java:993) at akka.remote.WireFormats$RemoteEnvelope.<init>(WireFormats.java:927) at akka.remote.WireFormats$RemoteEnvelope$1.parsePartialFrom(WireFormats.java:1049) at akka.remote.WireFormats$RemoteEnvelope$1.parsePartialFrom(WireFormats.java:1044) at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309) at akka.remote.WireFormats$AckAndEnvelopeContainer.<init>(WireFormats.java:241) at akka.remote.WireFormats$AckAndEnvelopeContainer.<init>(WireFormats.java:175) at akka.remote.WireFormats$AckAndEnvelopeContainer$1.parsePartialFrom(WireFormats.java:279) at akka.remote.WireFormats$AckAndEnvelopeContainer$1.parsePartialFrom(WireFormats.java:274) at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:141) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) at akka.remote.WireFormats$AckAndEnvelopeContainer.parseFrom(WireFormats.java:409) at akka.remote.transport.AkkaPduProtobufCodec$.decodeMessage(AkkaPduCodec.scala:181) at akka.remote.EndpointReader.akka$remote$EndpointReader$$tryDecodeMessageAndAck(Endpoint.scala:995) at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:928) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) 2016-06-30 14:45:18,502 INFO org.apache.flink.yarn.YarnJobManager - Stopping JobManager akka.tcp://flink@172.31.23.121:45569/user/jobmanager. 2016-06-30 14:45:18,533 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom File Source (1/1) (5f2a1062c796ec6098a0a88227b9eab4) switched from RUNNING to CANCELING {code} The JobManager JVM keeps running (keeping the YARN session alive) because the web monitor is not stopped on such errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)