Hi Till, I also tried the job without gzip, it came into the same error. But the problem is solved now. I was about to give up to solve it, I found the mail at http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/JVM-crash-SIGSEGV-in-ZIP-GetEntry-td17326.html. So I think maybe it was something about the serialize staff. What I have done is : before:
OperatorStateStore stateStore = context.getOperatorStateStore(); ListStateDescriptor lsd = new ListStateDescriptor("bucket-states",State.class); after: OperatorStateStore stateStore = context.getOperatorStateStore(); ListStateDescriptor lsd = new ListStateDescriptor("bucket-states",new JavaSerializer()); Hope this is helpful. Yours sincerely Josh Till Rohrmann <trohrm...@apache.org> 于2021年5月18日周二 下午2:54写道: > Hi Joshua, > > could you try whether the job also fails when not using the gzip format? > This could help us narrow down the culprit. Moreover, you could try to run > your job and Flink with Java 11 now. > > Cheers, > Till > > On Tue, May 18, 2021 at 5:10 AM Joshua Fan <joshuafat...@gmail.com> wrote: > >> Hi all, >> >> Most of the posts says that "Most of the times, the crashes in >> ZIP_GetEntry occur when the jar file being accessed has been >> modified/overwritten while the JVM instance was running. ", but do not >> know when and which jar file was modified according to the job running in >> flink. >> >> for your information. >> >> Yours sincerely >> Josh >> >> Joshua Fan <joshuafat...@gmail.com> 于2021年5月18日周二 上午10:15写道: >> >>> Hi Stephan, Till >>> >>> Recently, I tried to upgrade a flink job from 1.7 to 1.11, >>> unfortunately, the weird problem appeared, " SIGSEGV (0xb) at >>> pc=0x0000000000000025, pid=135306, tid=140439001388800". The pid log is >>> attached. >>> Actually, it is a simple job that consumes messages from kafka and >>> writes into hdfs with a gzip format. It can run in 1.11 for about 2 >>> minutes, then the JVM will crash, then job restart and jvm crash again >>> until the application fails. >>> I also tried to set -Dsun.zip.disableMemoryMapping=true,but it turns >>> out helpless, the same crash keeps happening. Google suggests to upgrade >>> jdk to jdk1.9, but it is not feasible. >>> Any suggestions? Thanks a lot. >>> >>> Yours sincerely >>> Josh >>> >>> Stephan Ewen <se...@apache.org> 于2019年9月13日周五 下午11:11写道: >>> >>>> Given that the segfault happens in the JVM's ZIP stream code, I am >>>> curious is this is a bug in Flink or in the JVM core libs, that happens to >>>> be triggered now by newer versions of FLink. >>>> >>>> I found this on StackOverflow, which looks like it could be related: >>>> https://stackoverflow.com/questions/38326183/jvm-crashed-in-java-util-zip-zipfile-getentry >>>> Can you try the suggested option "-Dsun.zip.disableMemoryMapping=true"? >>>> >>>> >>>> On Fri, Sep 13, 2019 at 11:36 AM Till Rohrmann <trohrm...@apache.org> >>>> wrote: >>>> >>>>> Hi Marek, >>>>> >>>>> could you share the logs statements which happened before the SIGSEGV >>>>> with us? They might be helpful to understand what happened before. >>>>> Moreover, it would be helpful to get access to your custom serializer >>>>> implementations. I'm also pulling in Gordon who worked on >>>>> the TypeSerializerSnapshot improvements. >>>>> >>>>> Cheers, >>>>> Till >>>>> >>>>> On Thu, Sep 12, 2019 at 9:28 AM Marek Maj <marekm...@gmail.com> wrote: >>>>> >>>>>> Hi everyone, >>>>>> >>>>>> Recently we decided to upgrade from flink 1.7.2 to 1.8.1. After an >>>>>> upgrade our task managers started to fail with SIGSEGV error from time to >>>>>> time. >>>>>> >>>>>> In process of adjusting the code to 1.8.1, we noticed that there were >>>>>> some changes around TypeSerializerSnapshot interface and its >>>>>> implementations. At that time we had a few custom serializers which we >>>>>> decided to throw out during migration and then leverage flink default >>>>>> serializers. We don't mind clearing the state in the process of >>>>>> migration, >>>>>> an effort to migrate with state seems to be not worth it. >>>>>> >>>>>> Unfortunately after running new version we see SIGSEGV errors from >>>>>> time to time. It may be that serialization is not the real cause, but at >>>>>> the moment it seems to be the most probable reason. We have not performed >>>>>> any significant code changes besides serialization area. >>>>>> >>>>>> We run job on yarn, hdp version 2.7.3.2.6.2.0-205. >>>>>> Checkpoint configuration: RocksDB backend, not incremental, 50s min >>>>>> processing time >>>>>> >>>>>> You can find parts of JobManager log and ErrorFile log of failed >>>>>> container included below. >>>>>> >>>>>> Any suggestions are welcome >>>>>> >>>>>> Best regards >>>>>> Marek Maj >>>>>> >>>>>> jobmanager.log >>>>>> >>>>>> 019-09-10 16:30:28.177 INFO o.a.f.r.c.CheckpointCoordinator - >>>>>> Completed checkpoint 47 for job c8a9ae03785ade86348c3189cf7dd965 >>>>>> (18532488122 bytes in 60871 ms). >>>>>> >>>>>> 2019-09-10 16:31:19.223 INFO o.a.f.r.c.CheckpointCoordinator - >>>>>> Triggering checkpoint 48 @ 1568111478177 for job >>>>>> c8a9ae03785ade86348c3189cf7dd965. >>>>>> >>>>>> 2019-09-10 16:32:19.280 INFO o.a.f.r.c.CheckpointCoordinator - >>>>>> Completed checkpoint 48 for job c8a9ae03785ade86348c3189cf7dd965 >>>>>> (19049515705 bytes in 61083 ms). >>>>>> >>>>>> 2019-09-10 16:33:10.480 INFO o.a.f.r.c.CheckpointCoordinator - >>>>>> Triggering checkpoint 49 @ 1568111589279 for job >>>>>> c8a9ae03785ade86348c3189cf7dd965. >>>>>> >>>>>> 2019-09-10 16:33:36.773 WARN o.a.f.r.r.h.l.m.MetricFetcherImpl - >>>>>> Requesting TaskManager's path for query services failed. >>>>>> >>>>>> java.util.concurrent.CompletionException: >>>>>> akka.pattern.AskTimeoutException: Ask timed out on >>>>>> [Actor[akka://flink/user/dispatcher#374570759]] after [10000 ms]. >>>>>> Sender[null] sent message of type >>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage". >>>>>> >>>>>> at >>>>>> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) >>>>>> >>>>>> at >>>>>> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) >>>>>> >>>>>> at >>>>>> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593) >>>>>> >>>>>> at >>>>>> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) >>>>>> >>>>>> at >>>>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) >>>>>> >>>>>> at >>>>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:816) >>>>>> >>>>>> at akka.dispatch.OnComplete.internal(Future.scala:258) >>>>>> >>>>>> at akka.dispatch.OnComplete.internal(Future.scala:256) >>>>>> >>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186) >>>>>> >>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183) >>>>>> >>>>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74) >>>>>> >>>>>> at >>>>>> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44) >>>>>> >>>>>> at >>>>>> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252) >>>>>> >>>>>> at >>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603) >>>>>> >>>>>> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126) >>>>>> >>>>>> at >>>>>> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601) >>>>>> >>>>>> at >>>>>> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) >>>>>> >>>>>> at >>>>>> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599) >>>>>> >>>>>> at >>>>>> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329) >>>>>> >>>>>> at >>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280) >>>>>> >>>>>> at >>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284) >>>>>> >>>>>> at >>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236) >>>>>> >>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>> >>>>>> Caused by: akka.pattern.AskTimeoutException: Ask timed out on >>>>>> [Actor[akka://flink/user/dispatcher#374570759]] after [10000 ms]. >>>>>> Sender[null] sent message of type >>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage". >>>>>> >>>>>> at >>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604) >>>>>> >>>>>> ... 9 common frames omitted >>>>>> >>>>>> 2019-09-10 16:33:48.782 WARN o.a.f.r.r.h.l.m.MetricFetcherImpl - >>>>>> Requesting TaskManager's path for query services failed. >>>>>> >>>>>> java.util.concurrent.CompletionException: >>>>>> akka.pattern.AskTimeoutException: Ask timed out on >>>>>> [Actor[akka://flink/user/dispatcher#374570759]] after [10000 ms]. >>>>>> Sender[null] sent message of type >>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage". >>>>>> >>>>>> at >>>>>> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) >>>>>> >>>>>> at >>>>>> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) >>>>>> >>>>>> at >>>>>> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593) >>>>>> >>>>>> at >>>>>> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) >>>>>> >>>>>> at >>>>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) >>>>>> >>>>>> at >>>>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:816) >>>>>> >>>>>> at akka.dispatch.OnComplete.internal(Future.scala:258) >>>>>> >>>>>> at akka.dispatch.OnComplete.internal(Future.scala:256) >>>>>> >>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186) >>>>>> >>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183) >>>>>> >>>>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74) >>>>>> >>>>>> at >>>>>> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44) >>>>>> >>>>>> at >>>>>> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252) >>>>>> >>>>>> at >>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603) >>>>>> >>>>>> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126) >>>>>> >>>>>> at >>>>>> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601) >>>>>> >>>>>> at >>>>>> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) >>>>>> >>>>>> at >>>>>> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599) >>>>>> >>>>>> at >>>>>> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329) >>>>>> >>>>>> at >>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280) >>>>>> >>>>>> at >>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284) >>>>>> >>>>>> at >>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236) >>>>>> >>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>> >>>>>> Caused by: akka.pattern.AskTimeoutException: Ask timed out on >>>>>> [Actor[akka://flink/user/dispatcher#374570759]] after [10000 ms]. >>>>>> Sender[null] sent message of type >>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage". >>>>>> >>>>>> at >>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604) >>>>>> >>>>>> ... 9 common frames omitted >>>>>> >>>>>> 2019-09-10 16:34:00.802 WARN o.a.f.r.r.h.l.m.MetricFetcherImpl - >>>>>> Requesting TaskManager's path for query services failed. >>>>>> >>>>>> java.util.concurrent.CompletionException: >>>>>> akka.pattern.AskTimeoutException: Ask timed out on >>>>>> [Actor[akka://flink/user/dispatcher#374570759]] after [10000 ms]. >>>>>> Sender[null] sent message of type >>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage". >>>>>> >>>>>> at >>>>>> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) >>>>>> >>>>>> at >>>>>> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) >>>>>> >>>>>> at >>>>>> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593) >>>>>> >>>>>> at >>>>>> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) >>>>>> >>>>>> at >>>>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) >>>>>> >>>>>> at >>>>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:816) >>>>>> >>>>>> at akka.dispatch.OnComplete.internal(Future.scala:258) >>>>>> >>>>>> at akka.dispatch.OnComplete.internal(Future.scala:256) >>>>>> >>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186) >>>>>> >>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183) >>>>>> >>>>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74) >>>>>> >>>>>> at >>>>>> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44) >>>>>> >>>>>> at >>>>>> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252) >>>>>> >>>>>> at >>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603) >>>>>> >>>>>> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126) >>>>>> >>>>>> at >>>>>> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601) >>>>>> >>>>>> at >>>>>> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) >>>>>> >>>>>> at >>>>>> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599) >>>>>> >>>>>> at >>>>>> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329) >>>>>> >>>>>> at >>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280) >>>>>> >>>>>> at >>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284) >>>>>> >>>>>> at >>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236) >>>>>> >>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>> >>>>>> Caused by: akka.pattern.AskTimeoutException: Ask timed out on >>>>>> [Actor[akka://flink/user/dispatcher#374570759]] after [10000 ms]. >>>>>> Sender[null] sent message of type >>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage". >>>>>> >>>>>> at >>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604) >>>>>> >>>>>> ... 9 common frames omitted >>>>>> >>>>>> 2019-09-10 16:34:03.800 INFO o.a.flink.yarn.YarnResourceManager - >>>>>> The heartbeat of TaskManager with id >>>>>> container_e67_1568017536744_0044_01_000023 timed out. >>>>>> >>>>>> 2019-09-10 16:34:03.801 INFO o.a.flink.yarn.YarnResourceManager - >>>>>> Closing TaskExecutor connection >>>>>> container_e67_1568017536744_0044_01_000023 >>>>>> because: The heartbeat of TaskManager with id >>>>>> container_e67_1568017536744_0044_01_000023 timed out. >>>>>> >>>>>> 2019-09-10 16:34:03.803 INFO o.a.f.r.e.ExecutionGraph - >>>>>> my-function (1/32) (ae416d03ddc94a3633673c4050b8f2ae) switched from >>>>>> RUNNING >>>>>> to FAILED. >>>>>> >>>>>> org.apache.flink.util.FlinkException: The assigned slot >>>>>> container_e67_1568017536744_0044_01_000023_0 was removed. >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:899) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:869) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:1080) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:391) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:845) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.resourcemanager.ResourceManager$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(ResourceManager.java:1187) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:318) >>>>>> >>>>>> at >>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>>>>> >>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:392) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:185) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40) >>>>>> >>>>>> at >>>>>> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165) >>>>>> >>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:502) >>>>>> >>>>>> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) >>>>>> >>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) >>>>>> >>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:495) >>>>>> >>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) >>>>>> >>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:224) >>>>>> >>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:234) >>>>>> >>>>>> at >>>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>>>> >>>>>> at >>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>>>>> >>>>>> at >>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>>>> >>>>>> at >>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>>>> >>>>>> 2019-09-10 16:34:03.803 INFO o.a.f.r.c.CheckpointCoordinator - >>>>>> Discarding checkpoint 49 of job c8a9ae03785ade86348c3189cf7dd965. >>>>>> >>>>>> org.apache.flink.util.FlinkException: The assigned slot >>>>>> container_e67_1568017536744_0044_01_000023_0 was removed. >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:899) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:869) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:1080) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:391) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:845) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.resourcemanager.ResourceManager$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(ResourceManager.java:1187) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:318) >>>>>> >>>>>> at >>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>>>>> >>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:392) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:185) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40) >>>>>> >>>>>> at >>>>>> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165) >>>>>> >>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:502) >>>>>> >>>>>> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) >>>>>> >>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) >>>>>> >>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:495) >>>>>> >>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) >>>>>> >>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:224) >>>>>> >>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:234) >>>>>> >>>>>> at >>>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>>>> >>>>>> at >>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>>>>> >>>>>> at >>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>>>> >>>>>> at >>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>>>> >>>>>> 2019-09-10 16:34:03.803 INFO o.a.f.r.e.ExecutionGraph - Job >>>>>> ProcessingJob (c8a9ae03785ade86348c3189cf7dd965) switched from state >>>>>> RUNNING to FAILING. >>>>>> >>>>>> org.apache.flink.util.FlinkException: The assigned slot >>>>>> container_e67_1568017536744_0044_01_000023_0 was removed. >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:899) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:869) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:1080) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:391) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:845) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.resourcemanager.ResourceManager$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(ResourceManager.java:1187) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:318) >>>>>> >>>>>> at >>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>>>>> >>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:392) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:185) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147) >>>>>> >>>>>> at >>>>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40) >>>>>> >>>>>> at >>>>>> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165) >>>>>> >>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:502) >>>>>> >>>>>> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) >>>>>> >>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) >>>>>> >>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:495) >>>>>> >>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) >>>>>> >>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:224) >>>>>> >>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:234) >>>>>> >>>>>> at >>>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>>>> >>>>>> at >>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>>>>> >>>>>> at >>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>>>> >>>>>> at >>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>>>> >>>>>> >>>>>> >>>>>> hs_err_pid_262348.log for failed container >>>>>> >>>>>> # >>>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>>> # >>>>>> # SIGSEGV (0xb) at pc=0x00007f294944b2c2, pid=262348, >>>>>> tid=0x00007f2916833700 >>>>>> # >>>>>> # JRE version: Java(TM) SE Runtime Environment (8.0_112-b15) (build >>>>>> 1.8.0_112-b15) >>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.112-b15 mixed mode >>>>>> linux-amd64 compressed oops) >>>>>> # Problematic frame: >>>>>> # C [libzip.so+0xb2c2] inflateEnd+0x32 >>>>>> # >>>>>> # Core dump written. Default location: >>>>>> /data/hadoop/yarn/local/usercache/flink/appcache/application_1568017536744_0044/container_e67_1568017536744_0044_01_000023/core >>>>>> or core.262348 >>>>>> # >>>>>> # If you would like to submit a bug report, please visit: >>>>>> # http://bugreport.java.com/bugreport/crash.jsp >>>>>> # The crash happened outside the Java Virtual Machine in native code. >>>>>> # See problematic frame for where to report the bug. >>>>>> # >>>>>> >>>>>> --------------- T H R E A D --------------- >>>>>> >>>>>> Current thread (0x00007f29440e8000): JavaThread "Finalizer" daemon >>>>>> [_thread_in_native, id=262401, >>>>>> stack(0x00007f2916733000,0x00007f2916834000)] >>>>>> >>>>>> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: >>>>>> 0x0000000000001080 >>>>>> >>>>>> Registers: >>>>>> RAX=0x00007f0100000001, RBX=0x00007f2945e52770, >>>>>> RCX=0x0000000000000180, RDX=0x00007f2945e52770 >>>>>> RSP=0x00007f29168323d0, RBP=0x00007f29168323e0, >>>>>> RSI=0x0000000000001040, RDI=0x00007f2945e52770 >>>>>> R8 =0x00000007bff0f170, R9 =0x0000000000000006, >>>>>> R10=0x00007f2935017a08, R11=0x00007f294b583d50 >>>>>> R12=0x00007f29440e81f8, R13=0x00007f293135cc58, >>>>>> R14=0x00007f2916832490, R15=0x00007f29440e8000 >>>>>> RIP=0x00007f294944b2c2, EFLAGS=0x0000000000010202, >>>>>> CSGSFS=0x0000000000000033, ERR=0x0000000000000004 >>>>>> TRAPNO=0x000000000000000e >>>>>> >>>>>> Top of Stack: (sp=0x00007f29168323d0) >>>>>> 0x00007f29168323d0: ffffffff440e8000 00007f2945e52770 >>>>>> 0x00007f29168323e0: 00007f2916832400 00007f294944338e >>>>>> 0x00007f29168323f0: 00007f293135cc58 0000000000000000 >>>>>> 0x00007f2916832400: 00007f2916832468 00007f2935017a34 >>>>>> 0x00007f2916832410: 00007f2916832540 00007f293501306d >>>>>> 0x00007f2916832420: 00007f29350055d0 00007f2916832428 >>>>>> 0x00007f2916832430: 0000000000000000 00007f2916832490 >>>>>> 0x00007f2916832440: 00007f293135cd70 0000000000000000 >>>>>> 0x00007f2916832450: 00007f293135cc58 0000000000000000 >>>>>> 0x00007f2916832460: 00007f2916832488 00007f29168324e8 >>>>>> 0x00007f2916832470: 00007f29350082bd 00000006ab616900 >>>>>> 0x00007f2916832480: 00007f2935011538 00007f2945e52770 >>>>>> 0x00007f2916832490: 00000007bff0f1e8 00000007bff0f1e8 >>>>>> 0x00007f29168324a0: 00000007bff0f1e8 00007f2916832498 >>>>>> 0x00007f29168324b0: 00007f293135c5e5 00007f2916832518 >>>>>> 0x00007f29168324c0: 00007f293135cd70 00007f29313f9840 >>>>>> 0x00007f29168324d0: 00007f293135c618 00007f2916832488 >>>>>> 0x00007f29168324e0: 00007f2916832518 00007f2916832580 >>>>>> 0x00007f29168324f0: 00007f29350082bd 0000000000000000 >>>>>> 0x00007f2916832500: 00007f2945e52770 0000000000000000 >>>>>> 0x00007f2916832510: 00000007bff0f1e8 00000007bff0cd38 >>>>>> 0x00007f2916832520: 0000000000000009 00000007bff0f158 >>>>>> 0x00007f2916832530: 0000006ce4720709 00000007bff0cd98 >>>>>> 0x00007f2916832540: 00007f2916832520 00007f293132f631 >>>>>> 0x00007f2916832550: 00007f29168325d8 00007f2931330ce0 >>>>>> 0x00007f2916832560: 0000000000000000 00007f293132f6c0 >>>>>> 0x00007f2916832570: 00007f2916832518 00007f29168325d8 >>>>>> 0x00007f2916832580: 00007f2916832620 00007f29350082bd >>>>>> 0x00007f2916832590: 0000000000000000 0000000000000000 >>>>>> 0x00007f29168325a0: 0000000000000000 0000000000000000 >>>>>> 0x00007f29168325b0: 0000000000000000 0000000000000000 >>>>>> 0x00007f29168325c0: 00000007bff0f158 00000007bff0cd38 >>>>>> >>>>>> Instructions: (pc=0x00007f294944b2c2) >>>>>> 0x00007f294944b2a2: fe ff ff ff 48 83 c4 08 5b c9 c3 0f 1f 00 48 8b >>>>>> 0x00007f294944b2b2: 77 28 48 85 f6 74 e8 48 8b 47 38 48 85 c0 74 df >>>>>> 0x00007f294944b2c2: 48 8b 56 40 48 85 d2 74 11 48 89 d6 48 8b 7f 40 >>>>>> 0x00007f294944b2d2: ff d0 48 8b 43 38 48 8b 73 28 48 8b 7b 40 ff d0 >>>>>> >>>>>> Register to memory mapping: >>>>>> >>>>>> RAX=0x00007f0100000001 is an unknown value >>>>>> RBX=0x00007f2945e52770 is an unknown value >>>>>> RCX=0x0000000000000180 is an unknown value >>>>>> RDX=0x00007f2945e52770 is an unknown value >>>>>> RSP=0x00007f29168323d0 is pointing into the stack for thread: >>>>>> 0x00007f29440e8000 >>>>>> RBP=0x00007f29168323e0 is pointing into the stack for thread: >>>>>> 0x00007f29440e8000 >>>>>> RSI=0x0000000000001040 is an unknown value >>>>>> RDI=0x00007f2945e52770 is an unknown value >>>>>> R8 =0x00000007bff0f170 is an oop >>>>>> [Ljava.lang.Object; >>>>>> - klass: 'java/lang/Object'[] >>>>>> - length: 16 >>>>>> R9 =0x0000000000000006 is an unknown value >>>>>> R10=0x00007f2935017a08 is at code_begin+808 in an Interpreter codelet >>>>>> method entry point (kind = native) [0x00007f29350176e0, >>>>>> 0x00007f2935017fe0] 2304 bytes >>>>>> R11=0x00007f294b583d50: <offset 0x9c3d50> in >>>>>> /usr/jdk64/jdk1.8.0_112/jre/lib/amd64/server/libjvm.so at >>>>>> 0x00007f294abc0000 >>>>>> R12=0x00007f29440e81f8 is an unknown value >>>>>> R13={method} {0x00007f293135cc58} 'end' '(J)V' in >>>>>> 'java/util/zip/Inflater' >>>>>> R14=0x00007f2916832490 is pointing into the stack for thread: >>>>>> 0x00007f29440e8000 >>>>>> R15=0x00007f29440e8000 is a thread >>>>>> >>>>>> >>>>>> Stack: [0x00007f2916733000,0x00007f2916834000], >>>>>> sp=0x00007f29168323d0, free space=1020k >>>>>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, >>>>>> C=native code) >>>>>> C [libzip.so+0xb2c2] inflateEnd+0x32 >>>>>> C [libzip.so+0x338e] Java_java_util_zip_Inflater_end+0x1e >>>>>> j java.util.zip.Inflater.end(J)V+0 >>>>>> j java.util.zip.Inflater.end()V+29 >>>>>> j java.util.zip.ZipFile.close()V+169 >>>>>> j sun.net.www.protocol.jar.URLJarFile.close()V+18 >>>>>> j sun.net.www.protocol.jar.URLJarFile.finalize()V+1 >>>>>> J 9535% C2 java.lang.ref.Finalizer$FinalizerThread.run()V (55 bytes) >>>>>> @ 0x00007f293674cec0 [0x00007f293674cc00+0x2c0] >>>>>> v ~StubRoutines::call_stub >>>>>> V [libjvm.so+0x690c66] JavaCalls::call_helper(JavaValue*, >>>>>> methodHandle*, JavaCallArguments*, Thread*)+0x1056 >>>>>> V [libjvm.so+0x691171] JavaCalls::call_virtual(JavaValue*, >>>>>> KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x321 >>>>>> V [libjvm.so+0x691617] JavaCalls::call_virtual(JavaValue*, Handle, >>>>>> KlassHandle, Symbol*, Symbol*, Thread*)+0x47 >>>>>> V [libjvm.so+0x72c990] thread_entry(JavaThread*, Thread*)+0xa0 >>>>>> V [libjvm.so+0xa755f3] JavaThread::thread_main_inner()+0x103 >>>>>> V [libjvm.so+0xa7573c] JavaThread::run()+0x11c >>>>>> V [libjvm.so+0x926138] java_start(Thread*)+0x108 >>>>>> C [libpthread.so.0+0x7e25] start_thread+0xc5 >>>>>> >>>>>> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) >>>>>> j java.util.zip.Inflater.end(J)V+0 >>>>>> j java.util.zip.Inflater.end()V+29 >>>>>> j java.util.zip.ZipFile.close()V+169 >>>>>> j sun.net.www.protocol.jar.URLJarFile.close()V+18 >>>>>> j sun.net.www.protocol.jar.URLJarFile.finalize()V+1 >>>>>> J 9535% C2 java.lang.ref.Finalizer$FinalizerThread.run()V (55 bytes) >>>>>> @ 0x00007f293674cec0 [0x00007f293674cc00+0x2c0] >>>>>> v ~StubRoutines::call_stub >>>>>> >>>>>