Hi Till,
I also tried the job without gzip, it came into the same error.
But the problem is solved now. I was about to give up to solve it, I found
the mail at
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/JVM-crash-SIGSEGV-in-ZIP-GetEntry-td17326.html.
So I think maybe it was something about the serialize staff.
What I have done is :
before:

OperatorStateStore stateStore = context.getOperatorStateStore();
ListStateDescriptor lsd = new ListStateDescriptor("bucket-states",State.class);

after:

OperatorStateStore stateStore = context.getOperatorStateStore();
ListStateDescriptor lsd = new ListStateDescriptor("bucket-states",new
JavaSerializer());

Hope this is helpful.

Yours sincerely
Josh



Till Rohrmann <trohrm...@apache.org> 于2021年5月18日周二 下午2:54写道:

> Hi Joshua,
>
> could you try whether the job also fails when not using the gzip format?
> This could help us narrow down the culprit. Moreover, you could try to run
> your job and Flink with Java 11 now.
>
> Cheers,
> Till
>
> On Tue, May 18, 2021 at 5:10 AM Joshua Fan <joshuafat...@gmail.com> wrote:
>
>> Hi all,
>>
>> Most of the posts says that "Most of the times, the crashes in
>> ZIP_GetEntry occur when the jar file being accessed has been
>> modified/overwritten while the JVM instance was running. ", but do not
>> know when and which jar file was modified according to the job running in
>> flink.
>>
>> for your information.
>>
>> Yours sincerely
>> Josh
>>
>> Joshua Fan <joshuafat...@gmail.com> 于2021年5月18日周二 上午10:15写道:
>>
>>> Hi Stephan, Till
>>>
>>> Recently, I tried to upgrade a flink job from 1.7 to 1.11,
>>> unfortunately, the weird problem appeared, " SIGSEGV (0xb) at
>>> pc=0x0000000000000025, pid=135306, tid=140439001388800".  The pid log is
>>> attached.
>>> Actually, it is a simple job that consumes messages from kafka and
>>> writes into hdfs with a gzip format. It can run in 1.11 for about 2
>>> minutes, then the JVM will crash, then job restart and jvm crash again
>>> until the application fails.
>>> I also tried to set -Dsun.zip.disableMemoryMapping=true,but it turns
>>> out helpless, the same crash keeps happening. Google suggests to upgrade
>>> jdk to jdk1.9, but it is not feasible.
>>> Any suggestions? Thanks a lot.
>>>
>>> Yours sincerely
>>> Josh
>>>
>>> Stephan Ewen <se...@apache.org> 于2019年9月13日周五 下午11:11写道:
>>>
>>>> Given that the segfault happens in the JVM's ZIP stream code, I am
>>>> curious is this is a bug in Flink or in the JVM core libs, that happens to
>>>> be triggered now by newer versions of FLink.
>>>>
>>>> I found this on StackOverflow, which looks like it could be related:
>>>> https://stackoverflow.com/questions/38326183/jvm-crashed-in-java-util-zip-zipfile-getentry
>>>> Can you try the suggested option "-Dsun.zip.disableMemoryMapping=true"?
>>>>
>>>>
>>>> On Fri, Sep 13, 2019 at 11:36 AM Till Rohrmann <trohrm...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Marek,
>>>>>
>>>>> could you share the logs statements which happened before the SIGSEGV
>>>>> with us? They might be helpful to understand what happened before.
>>>>> Moreover, it would be helpful to get access to your custom serializer
>>>>> implementations. I'm also pulling in Gordon who worked on
>>>>> the TypeSerializerSnapshot improvements.
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On Thu, Sep 12, 2019 at 9:28 AM Marek Maj <marekm...@gmail.com> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> Recently we decided to upgrade from flink 1.7.2 to 1.8.1. After an
>>>>>> upgrade our task managers started to fail with SIGSEGV error from time to
>>>>>> time.
>>>>>>
>>>>>> In process of adjusting the code to 1.8.1, we noticed that there were
>>>>>> some changes around TypeSerializerSnapshot interface and its
>>>>>> implementations. At that time we had a few custom serializers which we
>>>>>> decided to throw out during migration and then leverage flink default
>>>>>> serializers. We don't mind clearing the state in the process of 
>>>>>> migration,
>>>>>> an effort to migrate with state seems to be not worth it.
>>>>>>
>>>>>> Unfortunately after running new version we see SIGSEGV errors from
>>>>>> time to time. It may be that serialization is not the real cause, but at
>>>>>> the moment it seems to be the most probable reason. We have not performed
>>>>>> any significant code changes besides serialization area.
>>>>>>
>>>>>> We run job on yarn, hdp version 2.7.3.2.6.2.0-205.
>>>>>> Checkpoint configuration: RocksDB backend, not incremental, 50s min
>>>>>> processing time
>>>>>>
>>>>>> You can find parts of JobManager log and ErrorFile log of failed
>>>>>> container included below.
>>>>>>
>>>>>> Any suggestions are welcome
>>>>>>
>>>>>> Best regards
>>>>>> Marek Maj
>>>>>>
>>>>>> jobmanager.log
>>>>>>
>>>>>> 019-09-10 16:30:28.177 INFO  o.a.f.r.c.CheckpointCoordinator   -
>>>>>> Completed checkpoint 47 for job c8a9ae03785ade86348c3189cf7dd965
>>>>>> (18532488122 bytes in 60871 ms).
>>>>>>
>>>>>> 2019-09-10 16:31:19.223 INFO  o.a.f.r.c.CheckpointCoordinator   -
>>>>>> Triggering checkpoint 48 @ 1568111478177 for job
>>>>>> c8a9ae03785ade86348c3189cf7dd965.
>>>>>>
>>>>>> 2019-09-10 16:32:19.280 INFO  o.a.f.r.c.CheckpointCoordinator   -
>>>>>> Completed checkpoint 48 for job c8a9ae03785ade86348c3189cf7dd965
>>>>>> (19049515705 bytes in 61083 ms).
>>>>>>
>>>>>> 2019-09-10 16:33:10.480 INFO  o.a.f.r.c.CheckpointCoordinator   -
>>>>>> Triggering checkpoint 49 @ 1568111589279 for job
>>>>>> c8a9ae03785ade86348c3189cf7dd965.
>>>>>>
>>>>>> 2019-09-10 16:33:36.773 WARN  o.a.f.r.r.h.l.m.MetricFetcherImpl   -
>>>>>> Requesting TaskManager's path for query services failed.
>>>>>>
>>>>>> java.util.concurrent.CompletionException:
>>>>>> akka.pattern.AskTimeoutException: Ask timed out on
>>>>>> [Actor[akka://flink/user/dispatcher#374570759]] after [10000 ms].
>>>>>> Sender[null] sent message of type
>>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:816)
>>>>>>
>>>>>> at akka.dispatch.OnComplete.internal(Future.scala:258)
>>>>>>
>>>>>> at akka.dispatch.OnComplete.internal(Future.scala:256)
>>>>>>
>>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
>>>>>>
>>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
>>>>>>
>>>>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
>>>>>>
>>>>>> at
>>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
>>>>>>
>>>>>> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>>>>>>
>>>>>> at
>>>>>> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>>>>>>
>>>>>> at
>>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>>>>>>
>>>>>> at
>>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>>>>>>
>>>>>> at
>>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>>>>>>
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>
>>>>>> Caused by: akka.pattern.AskTimeoutException: Ask timed out on
>>>>>> [Actor[akka://flink/user/dispatcher#374570759]] after [10000 ms].
>>>>>> Sender[null] sent message of type
>>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>>>>>>
>>>>>> at
>>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>>>>>>
>>>>>> ... 9 common frames omitted
>>>>>>
>>>>>> 2019-09-10 16:33:48.782 WARN  o.a.f.r.r.h.l.m.MetricFetcherImpl   -
>>>>>> Requesting TaskManager's path for query services failed.
>>>>>>
>>>>>> java.util.concurrent.CompletionException:
>>>>>> akka.pattern.AskTimeoutException: Ask timed out on
>>>>>> [Actor[akka://flink/user/dispatcher#374570759]] after [10000 ms].
>>>>>> Sender[null] sent message of type
>>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:816)
>>>>>>
>>>>>> at akka.dispatch.OnComplete.internal(Future.scala:258)
>>>>>>
>>>>>> at akka.dispatch.OnComplete.internal(Future.scala:256)
>>>>>>
>>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
>>>>>>
>>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
>>>>>>
>>>>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
>>>>>>
>>>>>> at
>>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
>>>>>>
>>>>>> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>>>>>>
>>>>>> at
>>>>>> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>>>>>>
>>>>>> at
>>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>>>>>>
>>>>>> at
>>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>>>>>>
>>>>>> at
>>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>>>>>>
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>
>>>>>> Caused by: akka.pattern.AskTimeoutException: Ask timed out on
>>>>>> [Actor[akka://flink/user/dispatcher#374570759]] after [10000 ms].
>>>>>> Sender[null] sent message of type
>>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>>>>>>
>>>>>> at
>>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>>>>>>
>>>>>> ... 9 common frames omitted
>>>>>>
>>>>>> 2019-09-10 16:34:00.802 WARN  o.a.f.r.r.h.l.m.MetricFetcherImpl   -
>>>>>> Requesting TaskManager's path for query services failed.
>>>>>>
>>>>>> java.util.concurrent.CompletionException:
>>>>>> akka.pattern.AskTimeoutException: Ask timed out on
>>>>>> [Actor[akka://flink/user/dispatcher#374570759]] after [10000 ms].
>>>>>> Sender[null] sent message of type
>>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:816)
>>>>>>
>>>>>> at akka.dispatch.OnComplete.internal(Future.scala:258)
>>>>>>
>>>>>> at akka.dispatch.OnComplete.internal(Future.scala:256)
>>>>>>
>>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
>>>>>>
>>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
>>>>>>
>>>>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
>>>>>>
>>>>>> at
>>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
>>>>>>
>>>>>> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>>>>>>
>>>>>> at
>>>>>> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>>>>>>
>>>>>> at
>>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>>>>>>
>>>>>> at
>>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>>>>>>
>>>>>> at
>>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>>>>>>
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>
>>>>>> Caused by: akka.pattern.AskTimeoutException: Ask timed out on
>>>>>> [Actor[akka://flink/user/dispatcher#374570759]] after [10000 ms].
>>>>>> Sender[null] sent message of type
>>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>>>>>>
>>>>>> at
>>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>>>>>>
>>>>>> ... 9 common frames omitted
>>>>>>
>>>>>> 2019-09-10 16:34:03.800 INFO  o.a.flink.yarn.YarnResourceManager   -
>>>>>> The heartbeat of TaskManager with id
>>>>>> container_e67_1568017536744_0044_01_000023 timed out.
>>>>>>
>>>>>> 2019-09-10 16:34:03.801 INFO  o.a.flink.yarn.YarnResourceManager   -
>>>>>> Closing TaskExecutor connection 
>>>>>> container_e67_1568017536744_0044_01_000023
>>>>>> because: The heartbeat of TaskManager with id
>>>>>> container_e67_1568017536744_0044_01_000023  timed out.
>>>>>>
>>>>>> 2019-09-10 16:34:03.803 INFO  o.a.f.r.e.ExecutionGraph   -
>>>>>> my-function (1/32) (ae416d03ddc94a3633673c4050b8f2ae) switched from 
>>>>>> RUNNING
>>>>>> to FAILED.
>>>>>>
>>>>>> org.apache.flink.util.FlinkException: The assigned slot
>>>>>> container_e67_1568017536744_0044_01_000023_0 was removed.
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:899)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:869)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:1080)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:391)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:845)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.resourcemanager.ResourceManager$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(ResourceManager.java:1187)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:318)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>>>
>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:392)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:185)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
>>>>>>
>>>>>> at
>>>>>> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
>>>>>>
>>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
>>>>>>
>>>>>> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
>>>>>>
>>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
>>>>>>
>>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:495)
>>>>>>
>>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
>>>>>>
>>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:224)
>>>>>>
>>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>>
>>>>>> 2019-09-10 16:34:03.803 INFO  o.a.f.r.c.CheckpointCoordinator   -
>>>>>> Discarding checkpoint 49 of job c8a9ae03785ade86348c3189cf7dd965.
>>>>>>
>>>>>> org.apache.flink.util.FlinkException: The assigned slot
>>>>>> container_e67_1568017536744_0044_01_000023_0 was removed.
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:899)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:869)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:1080)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:391)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:845)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.resourcemanager.ResourceManager$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(ResourceManager.java:1187)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:318)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>>>
>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:392)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:185)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
>>>>>>
>>>>>> at
>>>>>> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
>>>>>>
>>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
>>>>>>
>>>>>> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
>>>>>>
>>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
>>>>>>
>>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:495)
>>>>>>
>>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
>>>>>>
>>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:224)
>>>>>>
>>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>>
>>>>>> 2019-09-10 16:34:03.803 INFO  o.a.f.r.e.ExecutionGraph   - Job
>>>>>> ProcessingJob (c8a9ae03785ade86348c3189cf7dd965) switched from state
>>>>>> RUNNING to FAILING.
>>>>>>
>>>>>> org.apache.flink.util.FlinkException: The assigned slot
>>>>>> container_e67_1568017536744_0044_01_000023_0 was removed.
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:899)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:869)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:1080)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:391)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:845)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.resourcemanager.ResourceManager$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(ResourceManager.java:1187)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:318)
>>>>>>
>>>>>> at
>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>>>
>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:392)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:185)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147)
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
>>>>>>
>>>>>> at
>>>>>> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
>>>>>>
>>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
>>>>>>
>>>>>> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
>>>>>>
>>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
>>>>>>
>>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:495)
>>>>>>
>>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
>>>>>>
>>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:224)
>>>>>>
>>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>>>
>>>>>> at
>>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>>
>>>>>>
>>>>>>
>>>>>> hs_err_pid_262348.log for failed container
>>>>>>
>>>>>> #
>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>> #
>>>>>> #  SIGSEGV (0xb) at pc=0x00007f294944b2c2, pid=262348,
>>>>>> tid=0x00007f2916833700
>>>>>> #
>>>>>> # JRE version: Java(TM) SE Runtime Environment (8.0_112-b15) (build
>>>>>> 1.8.0_112-b15)
>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.112-b15 mixed mode
>>>>>> linux-amd64 compressed oops)
>>>>>> # Problematic frame:
>>>>>> # C  [libzip.so+0xb2c2]  inflateEnd+0x32
>>>>>> #
>>>>>> # Core dump written. Default location:
>>>>>> /data/hadoop/yarn/local/usercache/flink/appcache/application_1568017536744_0044/container_e67_1568017536744_0044_01_000023/core
>>>>>> or core.262348
>>>>>> #
>>>>>> # If you would like to submit a bug report, please visit:
>>>>>> #   http://bugreport.java.com/bugreport/crash.jsp
>>>>>> # The crash happened outside the Java Virtual Machine in native code.
>>>>>> # See problematic frame for where to report the bug.
>>>>>> #
>>>>>>
>>>>>> ---------------  T H R E A D  ---------------
>>>>>>
>>>>>> Current thread (0x00007f29440e8000):  JavaThread "Finalizer" daemon
>>>>>> [_thread_in_native, id=262401, 
>>>>>> stack(0x00007f2916733000,0x00007f2916834000)]
>>>>>>
>>>>>> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr:
>>>>>> 0x0000000000001080
>>>>>>
>>>>>> Registers:
>>>>>> RAX=0x00007f0100000001, RBX=0x00007f2945e52770,
>>>>>> RCX=0x0000000000000180, RDX=0x00007f2945e52770
>>>>>> RSP=0x00007f29168323d0, RBP=0x00007f29168323e0,
>>>>>> RSI=0x0000000000001040, RDI=0x00007f2945e52770
>>>>>> R8 =0x00000007bff0f170, R9 =0x0000000000000006,
>>>>>> R10=0x00007f2935017a08, R11=0x00007f294b583d50
>>>>>> R12=0x00007f29440e81f8, R13=0x00007f293135cc58,
>>>>>> R14=0x00007f2916832490, R15=0x00007f29440e8000
>>>>>> RIP=0x00007f294944b2c2, EFLAGS=0x0000000000010202,
>>>>>> CSGSFS=0x0000000000000033, ERR=0x0000000000000004
>>>>>>   TRAPNO=0x000000000000000e
>>>>>>
>>>>>> Top of Stack: (sp=0x00007f29168323d0)
>>>>>> 0x00007f29168323d0:   ffffffff440e8000 00007f2945e52770
>>>>>> 0x00007f29168323e0:   00007f2916832400 00007f294944338e
>>>>>> 0x00007f29168323f0:   00007f293135cc58 0000000000000000
>>>>>> 0x00007f2916832400:   00007f2916832468 00007f2935017a34
>>>>>> 0x00007f2916832410:   00007f2916832540 00007f293501306d
>>>>>> 0x00007f2916832420:   00007f29350055d0 00007f2916832428
>>>>>> 0x00007f2916832430:   0000000000000000 00007f2916832490
>>>>>> 0x00007f2916832440:   00007f293135cd70 0000000000000000
>>>>>> 0x00007f2916832450:   00007f293135cc58 0000000000000000
>>>>>> 0x00007f2916832460:   00007f2916832488 00007f29168324e8
>>>>>> 0x00007f2916832470:   00007f29350082bd 00000006ab616900
>>>>>> 0x00007f2916832480:   00007f2935011538 00007f2945e52770
>>>>>> 0x00007f2916832490:   00000007bff0f1e8 00000007bff0f1e8
>>>>>> 0x00007f29168324a0:   00000007bff0f1e8 00007f2916832498
>>>>>> 0x00007f29168324b0:   00007f293135c5e5 00007f2916832518
>>>>>> 0x00007f29168324c0:   00007f293135cd70 00007f29313f9840
>>>>>> 0x00007f29168324d0:   00007f293135c618 00007f2916832488
>>>>>> 0x00007f29168324e0:   00007f2916832518 00007f2916832580
>>>>>> 0x00007f29168324f0:   00007f29350082bd 0000000000000000
>>>>>> 0x00007f2916832500:   00007f2945e52770 0000000000000000
>>>>>> 0x00007f2916832510:   00000007bff0f1e8 00000007bff0cd38
>>>>>> 0x00007f2916832520:   0000000000000009 00000007bff0f158
>>>>>> 0x00007f2916832530:   0000006ce4720709 00000007bff0cd98
>>>>>> 0x00007f2916832540:   00007f2916832520 00007f293132f631
>>>>>> 0x00007f2916832550:   00007f29168325d8 00007f2931330ce0
>>>>>> 0x00007f2916832560:   0000000000000000 00007f293132f6c0
>>>>>> 0x00007f2916832570:   00007f2916832518 00007f29168325d8
>>>>>> 0x00007f2916832580:   00007f2916832620 00007f29350082bd
>>>>>> 0x00007f2916832590:   0000000000000000 0000000000000000
>>>>>> 0x00007f29168325a0:   0000000000000000 0000000000000000
>>>>>> 0x00007f29168325b0:   0000000000000000 0000000000000000
>>>>>> 0x00007f29168325c0:   00000007bff0f158 00000007bff0cd38
>>>>>>
>>>>>> Instructions: (pc=0x00007f294944b2c2)
>>>>>> 0x00007f294944b2a2:   fe ff ff ff 48 83 c4 08 5b c9 c3 0f 1f 00 48 8b
>>>>>> 0x00007f294944b2b2:   77 28 48 85 f6 74 e8 48 8b 47 38 48 85 c0 74 df
>>>>>> 0x00007f294944b2c2:   48 8b 56 40 48 85 d2 74 11 48 89 d6 48 8b 7f 40
>>>>>> 0x00007f294944b2d2:   ff d0 48 8b 43 38 48 8b 73 28 48 8b 7b 40 ff d0
>>>>>>
>>>>>> Register to memory mapping:
>>>>>>
>>>>>> RAX=0x00007f0100000001 is an unknown value
>>>>>> RBX=0x00007f2945e52770 is an unknown value
>>>>>> RCX=0x0000000000000180 is an unknown value
>>>>>> RDX=0x00007f2945e52770 is an unknown value
>>>>>> RSP=0x00007f29168323d0 is pointing into the stack for thread:
>>>>>> 0x00007f29440e8000
>>>>>> RBP=0x00007f29168323e0 is pointing into the stack for thread:
>>>>>> 0x00007f29440e8000
>>>>>> RSI=0x0000000000001040 is an unknown value
>>>>>> RDI=0x00007f2945e52770 is an unknown value
>>>>>> R8 =0x00000007bff0f170 is an oop
>>>>>> [Ljava.lang.Object;
>>>>>>  - klass: 'java/lang/Object'[]
>>>>>>  - length: 16
>>>>>> R9 =0x0000000000000006 is an unknown value
>>>>>> R10=0x00007f2935017a08 is at code_begin+808 in an Interpreter codelet
>>>>>> method entry point (kind = native)  [0x00007f29350176e0,
>>>>>> 0x00007f2935017fe0]  2304 bytes
>>>>>> R11=0x00007f294b583d50: <offset 0x9c3d50> in
>>>>>> /usr/jdk64/jdk1.8.0_112/jre/lib/amd64/server/libjvm.so at 
>>>>>> 0x00007f294abc0000
>>>>>> R12=0x00007f29440e81f8 is an unknown value
>>>>>> R13={method} {0x00007f293135cc58} 'end' '(J)V' in
>>>>>> 'java/util/zip/Inflater'
>>>>>> R14=0x00007f2916832490 is pointing into the stack for thread:
>>>>>> 0x00007f29440e8000
>>>>>> R15=0x00007f29440e8000 is a thread
>>>>>>
>>>>>>
>>>>>> Stack: [0x00007f2916733000,0x00007f2916834000],
>>>>>>  sp=0x00007f29168323d0,  free space=1020k
>>>>>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
>>>>>> C=native code)
>>>>>> C  [libzip.so+0xb2c2]  inflateEnd+0x32
>>>>>> C  [libzip.so+0x338e]  Java_java_util_zip_Inflater_end+0x1e
>>>>>> j  java.util.zip.Inflater.end(J)V+0
>>>>>> j  java.util.zip.Inflater.end()V+29
>>>>>> j  java.util.zip.ZipFile.close()V+169
>>>>>> j  sun.net.www.protocol.jar.URLJarFile.close()V+18
>>>>>> j  sun.net.www.protocol.jar.URLJarFile.finalize()V+1
>>>>>> J 9535% C2 java.lang.ref.Finalizer$FinalizerThread.run()V (55 bytes)
>>>>>> @ 0x00007f293674cec0 [0x00007f293674cc00+0x2c0]
>>>>>> v  ~StubRoutines::call_stub
>>>>>> V  [libjvm.so+0x690c66]  JavaCalls::call_helper(JavaValue*,
>>>>>> methodHandle*, JavaCallArguments*, Thread*)+0x1056
>>>>>> V  [libjvm.so+0x691171]  JavaCalls::call_virtual(JavaValue*,
>>>>>> KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x321
>>>>>> V  [libjvm.so+0x691617]  JavaCalls::call_virtual(JavaValue*, Handle,
>>>>>> KlassHandle, Symbol*, Symbol*, Thread*)+0x47
>>>>>> V  [libjvm.so+0x72c990]  thread_entry(JavaThread*, Thread*)+0xa0
>>>>>> V  [libjvm.so+0xa755f3]  JavaThread::thread_main_inner()+0x103
>>>>>> V  [libjvm.so+0xa7573c]  JavaThread::run()+0x11c
>>>>>> V  [libjvm.so+0x926138]  java_start(Thread*)+0x108
>>>>>> C  [libpthread.so.0+0x7e25]  start_thread+0xc5
>>>>>>
>>>>>> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
>>>>>> j  java.util.zip.Inflater.end(J)V+0
>>>>>> j  java.util.zip.Inflater.end()V+29
>>>>>> j  java.util.zip.ZipFile.close()V+169
>>>>>> j  sun.net.www.protocol.jar.URLJarFile.close()V+18
>>>>>> j  sun.net.www.protocol.jar.URLJarFile.finalize()V+1
>>>>>> J 9535% C2 java.lang.ref.Finalizer$FinalizerThread.run()V (55 bytes)
>>>>>> @ 0x00007f293674cec0 [0x00007f293674cc00+0x2c0]
>>>>>> v  ~StubRoutines::call_stub
>>>>>>
>>>>>

Reply via email to