Great to hear that you fixed the problem by specifying an explicit
serializer for the state.

Cheers,
Till

On Tue, May 18, 2021 at 9:43 AM Joshua Fan <joshuafat...@gmail.com> wrote:

> Hi Till,
> I also tried the job without gzip, it came into the same error.
> But the problem is solved now. I was about to give up to solve it, I found
> the mail at
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/JVM-crash-SIGSEGV-in-ZIP-GetEntry-td17326.html.
> So I think maybe it was something about the serialize staff.
> What I have done is :
> before:
>
> OperatorStateStore stateStore = context.getOperatorStateStore();
> ListStateDescriptor lsd = new 
> ListStateDescriptor("bucket-states",State.class);
>
> after:
>
> OperatorStateStore stateStore = context.getOperatorStateStore();
> ListStateDescriptor lsd = new ListStateDescriptor("bucket-states",new 
> JavaSerializer());
>
> Hope this is helpful.
>
> Yours sincerely
> Josh
>
>
>
> Till Rohrmann <trohrm...@apache.org> 于2021年5月18日周二 下午2:54写道:
>
>> Hi Joshua,
>>
>> could you try whether the job also fails when not using the gzip format?
>> This could help us narrow down the culprit. Moreover, you could try to run
>> your job and Flink with Java 11 now.
>>
>> Cheers,
>> Till
>>
>> On Tue, May 18, 2021 at 5:10 AM Joshua Fan <joshuafat...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> Most of the posts says that "Most of the times, the crashes in
>>> ZIP_GetEntry occur when the jar file being accessed has been
>>> modified/overwritten while the JVM instance was running. ", but do not
>>> know when and which jar file was modified according to the job running in
>>> flink.
>>>
>>> for your information.
>>>
>>> Yours sincerely
>>> Josh
>>>
>>> Joshua Fan <joshuafat...@gmail.com> 于2021年5月18日周二 上午10:15写道:
>>>
>>>> Hi Stephan, Till
>>>>
>>>> Recently, I tried to upgrade a flink job from 1.7 to 1.11,
>>>> unfortunately, the weird problem appeared, " SIGSEGV (0xb) at
>>>> pc=0x0000000000000025, pid=135306, tid=140439001388800".  The pid log is
>>>> attached.
>>>> Actually, it is a simple job that consumes messages from kafka and
>>>> writes into hdfs with a gzip format. It can run in 1.11 for about 2
>>>> minutes, then the JVM will crash, then job restart and jvm crash again
>>>> until the application fails.
>>>> I also tried to set -Dsun.zip.disableMemoryMapping=true,but it turns
>>>> out helpless, the same crash keeps happening. Google suggests to upgrade
>>>> jdk to jdk1.9, but it is not feasible.
>>>> Any suggestions? Thanks a lot.
>>>>
>>>> Yours sincerely
>>>> Josh
>>>>
>>>> Stephan Ewen <se...@apache.org> 于2019年9月13日周五 下午11:11写道:
>>>>
>>>>> Given that the segfault happens in the JVM's ZIP stream code, I am
>>>>> curious is this is a bug in Flink or in the JVM core libs, that happens to
>>>>> be triggered now by newer versions of FLink.
>>>>>
>>>>> I found this on StackOverflow, which looks like it could be related:
>>>>> https://stackoverflow.com/questions/38326183/jvm-crashed-in-java-util-zip-zipfile-getentry
>>>>> Can you try the suggested option "-Dsun.zip.disableMemoryMapping=true"
>>>>> ?
>>>>>
>>>>>
>>>>> On Fri, Sep 13, 2019 at 11:36 AM Till Rohrmann <trohrm...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Marek,
>>>>>>
>>>>>> could you share the logs statements which happened before the SIGSEGV
>>>>>> with us? They might be helpful to understand what happened before.
>>>>>> Moreover, it would be helpful to get access to your custom serializer
>>>>>> implementations. I'm also pulling in Gordon who worked on
>>>>>> the TypeSerializerSnapshot improvements.
>>>>>>
>>>>>> Cheers,
>>>>>> Till
>>>>>>
>>>>>> On Thu, Sep 12, 2019 at 9:28 AM Marek Maj <marekm...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> Recently we decided to upgrade from flink 1.7.2 to 1.8.1. After an
>>>>>>> upgrade our task managers started to fail with SIGSEGV error from time 
>>>>>>> to
>>>>>>> time.
>>>>>>>
>>>>>>> In process of adjusting the code to 1.8.1, we noticed that there
>>>>>>> were some changes around TypeSerializerSnapshot interface and its
>>>>>>> implementations. At that time we had a few custom serializers which we
>>>>>>> decided to throw out during migration and then leverage flink default
>>>>>>> serializers. We don't mind clearing the state in the process of 
>>>>>>> migration,
>>>>>>> an effort to migrate with state seems to be not worth it.
>>>>>>>
>>>>>>> Unfortunately after running new version we see SIGSEGV errors from
>>>>>>> time to time. It may be that serialization is not the real cause, but at
>>>>>>> the moment it seems to be the most probable reason. We have not 
>>>>>>> performed
>>>>>>> any significant code changes besides serialization area.
>>>>>>>
>>>>>>> We run job on yarn, hdp version 2.7.3.2.6.2.0-205.
>>>>>>> Checkpoint configuration: RocksDB backend, not incremental, 50s min
>>>>>>> processing time
>>>>>>>
>>>>>>> You can find parts of JobManager log and ErrorFile log of failed
>>>>>>> container included below.
>>>>>>>
>>>>>>> Any suggestions are welcome
>>>>>>>
>>>>>>> Best regards
>>>>>>> Marek Maj
>>>>>>>
>>>>>>> jobmanager.log
>>>>>>>
>>>>>>> 019-09-10 16:30:28.177 INFO  o.a.f.r.c.CheckpointCoordinator   -
>>>>>>> Completed checkpoint 47 for job c8a9ae03785ade86348c3189cf7dd965
>>>>>>> (18532488122 bytes in 60871 ms).
>>>>>>>
>>>>>>> 2019-09-10 16:31:19.223 INFO  o.a.f.r.c.CheckpointCoordinator   -
>>>>>>> Triggering checkpoint 48 @ 1568111478177 for job
>>>>>>> c8a9ae03785ade86348c3189cf7dd965.
>>>>>>>
>>>>>>> 2019-09-10 16:32:19.280 INFO  o.a.f.r.c.CheckpointCoordinator   -
>>>>>>> Completed checkpoint 48 for job c8a9ae03785ade86348c3189cf7dd965
>>>>>>> (19049515705 bytes in 61083 ms).
>>>>>>>
>>>>>>> 2019-09-10 16:33:10.480 INFO  o.a.f.r.c.CheckpointCoordinator   -
>>>>>>> Triggering checkpoint 49 @ 1568111589279 for job
>>>>>>> c8a9ae03785ade86348c3189cf7dd965.
>>>>>>>
>>>>>>> 2019-09-10 16:33:36.773 WARN  o.a.f.r.r.h.l.m.MetricFetcherImpl   -
>>>>>>> Requesting TaskManager's path for query services failed.
>>>>>>>
>>>>>>> java.util.concurrent.CompletionException:
>>>>>>> akka.pattern.AskTimeoutException: Ask timed out on
>>>>>>> [Actor[akka://flink/user/dispatcher#374570759]] after [10000 ms].
>>>>>>> Sender[null] sent message of type
>>>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:816)
>>>>>>>
>>>>>>> at akka.dispatch.OnComplete.internal(Future.scala:258)
>>>>>>>
>>>>>>> at akka.dispatch.OnComplete.internal(Future.scala:256)
>>>>>>>
>>>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
>>>>>>>
>>>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
>>>>>>>
>>>>>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
>>>>>>>
>>>>>>> at
>>>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
>>>>>>>
>>>>>>> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>>>>>>>
>>>>>>> at
>>>>>>> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>>>>>>>
>>>>>>> at
>>>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>>>>>>>
>>>>>>> at
>>>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>>>>>>>
>>>>>>> at
>>>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>>>>>>>
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>>
>>>>>>> Caused by: akka.pattern.AskTimeoutException: Ask timed out on
>>>>>>> [Actor[akka://flink/user/dispatcher#374570759]] after [10000 ms].
>>>>>>> Sender[null] sent message of type
>>>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>>>>>>>
>>>>>>> at
>>>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>>>>>>>
>>>>>>> ... 9 common frames omitted
>>>>>>>
>>>>>>> 2019-09-10 16:33:48.782 WARN  o.a.f.r.r.h.l.m.MetricFetcherImpl   -
>>>>>>> Requesting TaskManager's path for query services failed.
>>>>>>>
>>>>>>> java.util.concurrent.CompletionException:
>>>>>>> akka.pattern.AskTimeoutException: Ask timed out on
>>>>>>> [Actor[akka://flink/user/dispatcher#374570759]] after [10000 ms].
>>>>>>> Sender[null] sent message of type
>>>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:816)
>>>>>>>
>>>>>>> at akka.dispatch.OnComplete.internal(Future.scala:258)
>>>>>>>
>>>>>>> at akka.dispatch.OnComplete.internal(Future.scala:256)
>>>>>>>
>>>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
>>>>>>>
>>>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
>>>>>>>
>>>>>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
>>>>>>>
>>>>>>> at
>>>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
>>>>>>>
>>>>>>> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>>>>>>>
>>>>>>> at
>>>>>>> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>>>>>>>
>>>>>>> at
>>>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>>>>>>>
>>>>>>> at
>>>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>>>>>>>
>>>>>>> at
>>>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>>>>>>>
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>>
>>>>>>> Caused by: akka.pattern.AskTimeoutException: Ask timed out on
>>>>>>> [Actor[akka://flink/user/dispatcher#374570759]] after [10000 ms].
>>>>>>> Sender[null] sent message of type
>>>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>>>>>>>
>>>>>>> at
>>>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>>>>>>>
>>>>>>> ... 9 common frames omitted
>>>>>>>
>>>>>>> 2019-09-10 16:34:00.802 WARN  o.a.f.r.r.h.l.m.MetricFetcherImpl   -
>>>>>>> Requesting TaskManager's path for query services failed.
>>>>>>>
>>>>>>> java.util.concurrent.CompletionException:
>>>>>>> akka.pattern.AskTimeoutException: Ask timed out on
>>>>>>> [Actor[akka://flink/user/dispatcher#374570759]] after [10000 ms].
>>>>>>> Sender[null] sent message of type
>>>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:816)
>>>>>>>
>>>>>>> at akka.dispatch.OnComplete.internal(Future.scala:258)
>>>>>>>
>>>>>>> at akka.dispatch.OnComplete.internal(Future.scala:256)
>>>>>>>
>>>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
>>>>>>>
>>>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
>>>>>>>
>>>>>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
>>>>>>>
>>>>>>> at
>>>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
>>>>>>>
>>>>>>> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>>>>>>>
>>>>>>> at
>>>>>>> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>>>>>>>
>>>>>>> at
>>>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>>>>>>>
>>>>>>> at
>>>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>>>>>>>
>>>>>>> at
>>>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>>>>>>>
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>>
>>>>>>> Caused by: akka.pattern.AskTimeoutException: Ask timed out on
>>>>>>> [Actor[akka://flink/user/dispatcher#374570759]] after [10000 ms].
>>>>>>> Sender[null] sent message of type
>>>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>>>>>>>
>>>>>>> at
>>>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>>>>>>>
>>>>>>> ... 9 common frames omitted
>>>>>>>
>>>>>>> 2019-09-10 16:34:03.800 INFO  o.a.flink.yarn.YarnResourceManager   -
>>>>>>> The heartbeat of TaskManager with id
>>>>>>> container_e67_1568017536744_0044_01_000023 timed out.
>>>>>>>
>>>>>>> 2019-09-10 16:34:03.801 INFO  o.a.flink.yarn.YarnResourceManager   -
>>>>>>> Closing TaskExecutor connection 
>>>>>>> container_e67_1568017536744_0044_01_000023
>>>>>>> because: The heartbeat of TaskManager with id
>>>>>>> container_e67_1568017536744_0044_01_000023  timed out.
>>>>>>>
>>>>>>> 2019-09-10 16:34:03.803 INFO  o.a.f.r.e.ExecutionGraph   -
>>>>>>> my-function (1/32) (ae416d03ddc94a3633673c4050b8f2ae) switched from 
>>>>>>> RUNNING
>>>>>>> to FAILED.
>>>>>>>
>>>>>>> org.apache.flink.util.FlinkException: The assigned slot
>>>>>>> container_e67_1568017536744_0044_01_000023_0 was removed.
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:899)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:869)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:1080)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:391)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:845)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.resourcemanager.ResourceManager$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(ResourceManager.java:1187)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:318)
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>>>>
>>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:392)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:185)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
>>>>>>>
>>>>>>> at
>>>>>>> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
>>>>>>>
>>>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
>>>>>>>
>>>>>>> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
>>>>>>>
>>>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
>>>>>>>
>>>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:495)
>>>>>>>
>>>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
>>>>>>>
>>>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:224)
>>>>>>>
>>>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>>>
>>>>>>> 2019-09-10 16:34:03.803 INFO  o.a.f.r.c.CheckpointCoordinator   -
>>>>>>> Discarding checkpoint 49 of job c8a9ae03785ade86348c3189cf7dd965.
>>>>>>>
>>>>>>> org.apache.flink.util.FlinkException: The assigned slot
>>>>>>> container_e67_1568017536744_0044_01_000023_0 was removed.
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:899)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:869)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:1080)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:391)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:845)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.resourcemanager.ResourceManager$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(ResourceManager.java:1187)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:318)
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>>>>
>>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:392)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:185)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
>>>>>>>
>>>>>>> at
>>>>>>> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
>>>>>>>
>>>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
>>>>>>>
>>>>>>> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
>>>>>>>
>>>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
>>>>>>>
>>>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:495)
>>>>>>>
>>>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
>>>>>>>
>>>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:224)
>>>>>>>
>>>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>>>
>>>>>>> 2019-09-10 16:34:03.803 INFO  o.a.f.r.e.ExecutionGraph   - Job
>>>>>>> ProcessingJob (c8a9ae03785ade86348c3189cf7dd965) switched from state
>>>>>>> RUNNING to FAILING.
>>>>>>>
>>>>>>> org.apache.flink.util.FlinkException: The assigned slot
>>>>>>> container_e67_1568017536744_0044_01_000023_0 was removed.
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:899)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:869)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:1080)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:391)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:845)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.resourcemanager.ResourceManager$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(ResourceManager.java:1187)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:318)
>>>>>>>
>>>>>>> at
>>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>>>>
>>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:392)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:185)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
>>>>>>>
>>>>>>> at
>>>>>>> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
>>>>>>>
>>>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
>>>>>>>
>>>>>>> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
>>>>>>>
>>>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
>>>>>>>
>>>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:495)
>>>>>>>
>>>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
>>>>>>>
>>>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:224)
>>>>>>>
>>>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>>>>
>>>>>>> at
>>>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> hs_err_pid_262348.log for failed container
>>>>>>>
>>>>>>> #
>>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>>> #
>>>>>>> #  SIGSEGV (0xb) at pc=0x00007f294944b2c2, pid=262348,
>>>>>>> tid=0x00007f2916833700
>>>>>>> #
>>>>>>> # JRE version: Java(TM) SE Runtime Environment (8.0_112-b15) (build
>>>>>>> 1.8.0_112-b15)
>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.112-b15 mixed mode
>>>>>>> linux-amd64 compressed oops)
>>>>>>> # Problematic frame:
>>>>>>> # C  [libzip.so+0xb2c2]  inflateEnd+0x32
>>>>>>> #
>>>>>>> # Core dump written. Default location:
>>>>>>> /data/hadoop/yarn/local/usercache/flink/appcache/application_1568017536744_0044/container_e67_1568017536744_0044_01_000023/core
>>>>>>> or core.262348
>>>>>>> #
>>>>>>> # If you would like to submit a bug report, please visit:
>>>>>>> #   http://bugreport.java.com/bugreport/crash.jsp
>>>>>>> # The crash happened outside the Java Virtual Machine in native code.
>>>>>>> # See problematic frame for where to report the bug.
>>>>>>> #
>>>>>>>
>>>>>>> ---------------  T H R E A D  ---------------
>>>>>>>
>>>>>>> Current thread (0x00007f29440e8000):  JavaThread "Finalizer" daemon
>>>>>>> [_thread_in_native, id=262401, 
>>>>>>> stack(0x00007f2916733000,0x00007f2916834000)]
>>>>>>>
>>>>>>> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr:
>>>>>>> 0x0000000000001080
>>>>>>>
>>>>>>> Registers:
>>>>>>> RAX=0x00007f0100000001, RBX=0x00007f2945e52770,
>>>>>>> RCX=0x0000000000000180, RDX=0x00007f2945e52770
>>>>>>> RSP=0x00007f29168323d0, RBP=0x00007f29168323e0,
>>>>>>> RSI=0x0000000000001040, RDI=0x00007f2945e52770
>>>>>>> R8 =0x00000007bff0f170, R9 =0x0000000000000006,
>>>>>>> R10=0x00007f2935017a08, R11=0x00007f294b583d50
>>>>>>> R12=0x00007f29440e81f8, R13=0x00007f293135cc58,
>>>>>>> R14=0x00007f2916832490, R15=0x00007f29440e8000
>>>>>>> RIP=0x00007f294944b2c2, EFLAGS=0x0000000000010202,
>>>>>>> CSGSFS=0x0000000000000033, ERR=0x0000000000000004
>>>>>>>   TRAPNO=0x000000000000000e
>>>>>>>
>>>>>>> Top of Stack: (sp=0x00007f29168323d0)
>>>>>>> 0x00007f29168323d0:   ffffffff440e8000 00007f2945e52770
>>>>>>> 0x00007f29168323e0:   00007f2916832400 00007f294944338e
>>>>>>> 0x00007f29168323f0:   00007f293135cc58 0000000000000000
>>>>>>> 0x00007f2916832400:   00007f2916832468 00007f2935017a34
>>>>>>> 0x00007f2916832410:   00007f2916832540 00007f293501306d
>>>>>>> 0x00007f2916832420:   00007f29350055d0 00007f2916832428
>>>>>>> 0x00007f2916832430:   0000000000000000 00007f2916832490
>>>>>>> 0x00007f2916832440:   00007f293135cd70 0000000000000000
>>>>>>> 0x00007f2916832450:   00007f293135cc58 0000000000000000
>>>>>>> 0x00007f2916832460:   00007f2916832488 00007f29168324e8
>>>>>>> 0x00007f2916832470:   00007f29350082bd 00000006ab616900
>>>>>>> 0x00007f2916832480:   00007f2935011538 00007f2945e52770
>>>>>>> 0x00007f2916832490:   00000007bff0f1e8 00000007bff0f1e8
>>>>>>> 0x00007f29168324a0:   00000007bff0f1e8 00007f2916832498
>>>>>>> 0x00007f29168324b0:   00007f293135c5e5 00007f2916832518
>>>>>>> 0x00007f29168324c0:   00007f293135cd70 00007f29313f9840
>>>>>>> 0x00007f29168324d0:   00007f293135c618 00007f2916832488
>>>>>>> 0x00007f29168324e0:   00007f2916832518 00007f2916832580
>>>>>>> 0x00007f29168324f0:   00007f29350082bd 0000000000000000
>>>>>>> 0x00007f2916832500:   00007f2945e52770 0000000000000000
>>>>>>> 0x00007f2916832510:   00000007bff0f1e8 00000007bff0cd38
>>>>>>> 0x00007f2916832520:   0000000000000009 00000007bff0f158
>>>>>>> 0x00007f2916832530:   0000006ce4720709 00000007bff0cd98
>>>>>>> 0x00007f2916832540:   00007f2916832520 00007f293132f631
>>>>>>> 0x00007f2916832550:   00007f29168325d8 00007f2931330ce0
>>>>>>> 0x00007f2916832560:   0000000000000000 00007f293132f6c0
>>>>>>> 0x00007f2916832570:   00007f2916832518 00007f29168325d8
>>>>>>> 0x00007f2916832580:   00007f2916832620 00007f29350082bd
>>>>>>> 0x00007f2916832590:   0000000000000000 0000000000000000
>>>>>>> 0x00007f29168325a0:   0000000000000000 0000000000000000
>>>>>>> 0x00007f29168325b0:   0000000000000000 0000000000000000
>>>>>>> 0x00007f29168325c0:   00000007bff0f158 00000007bff0cd38
>>>>>>>
>>>>>>> Instructions: (pc=0x00007f294944b2c2)
>>>>>>> 0x00007f294944b2a2:   fe ff ff ff 48 83 c4 08 5b c9 c3 0f 1f 00 48 8b
>>>>>>> 0x00007f294944b2b2:   77 28 48 85 f6 74 e8 48 8b 47 38 48 85 c0 74 df
>>>>>>> 0x00007f294944b2c2:   48 8b 56 40 48 85 d2 74 11 48 89 d6 48 8b 7f 40
>>>>>>> 0x00007f294944b2d2:   ff d0 48 8b 43 38 48 8b 73 28 48 8b 7b 40 ff
>>>>>>> d0
>>>>>>>
>>>>>>> Register to memory mapping:
>>>>>>>
>>>>>>> RAX=0x00007f0100000001 is an unknown value
>>>>>>> RBX=0x00007f2945e52770 is an unknown value
>>>>>>> RCX=0x0000000000000180 is an unknown value
>>>>>>> RDX=0x00007f2945e52770 is an unknown value
>>>>>>> RSP=0x00007f29168323d0 is pointing into the stack for thread:
>>>>>>> 0x00007f29440e8000
>>>>>>> RBP=0x00007f29168323e0 is pointing into the stack for thread:
>>>>>>> 0x00007f29440e8000
>>>>>>> RSI=0x0000000000001040 is an unknown value
>>>>>>> RDI=0x00007f2945e52770 is an unknown value
>>>>>>> R8 =0x00000007bff0f170 is an oop
>>>>>>> [Ljava.lang.Object;
>>>>>>>  - klass: 'java/lang/Object'[]
>>>>>>>  - length: 16
>>>>>>> R9 =0x0000000000000006 is an unknown value
>>>>>>> R10=0x00007f2935017a08 is at code_begin+808 in an Interpreter codelet
>>>>>>> method entry point (kind = native)  [0x00007f29350176e0,
>>>>>>> 0x00007f2935017fe0]  2304 bytes
>>>>>>> R11=0x00007f294b583d50: <offset 0x9c3d50> in
>>>>>>> /usr/jdk64/jdk1.8.0_112/jre/lib/amd64/server/libjvm.so at 
>>>>>>> 0x00007f294abc0000
>>>>>>> R12=0x00007f29440e81f8 is an unknown value
>>>>>>> R13={method} {0x00007f293135cc58} 'end' '(J)V' in
>>>>>>> 'java/util/zip/Inflater'
>>>>>>> R14=0x00007f2916832490 is pointing into the stack for thread:
>>>>>>> 0x00007f29440e8000
>>>>>>> R15=0x00007f29440e8000 is a thread
>>>>>>>
>>>>>>>
>>>>>>> Stack: [0x00007f2916733000,0x00007f2916834000],
>>>>>>>  sp=0x00007f29168323d0,  free space=1020k
>>>>>>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
>>>>>>> C=native code)
>>>>>>> C  [libzip.so+0xb2c2]  inflateEnd+0x32
>>>>>>> C  [libzip.so+0x338e]  Java_java_util_zip_Inflater_end+0x1e
>>>>>>> j  java.util.zip.Inflater.end(J)V+0
>>>>>>> j  java.util.zip.Inflater.end()V+29
>>>>>>> j  java.util.zip.ZipFile.close()V+169
>>>>>>> j  sun.net.www.protocol.jar.URLJarFile.close()V+18
>>>>>>> j  sun.net.www.protocol.jar.URLJarFile.finalize()V+1
>>>>>>> J 9535% C2 java.lang.ref.Finalizer$FinalizerThread.run()V (55 bytes)
>>>>>>> @ 0x00007f293674cec0 [0x00007f293674cc00+0x2c0]
>>>>>>> v  ~StubRoutines::call_stub
>>>>>>> V  [libjvm.so+0x690c66]  JavaCalls::call_helper(JavaValue*,
>>>>>>> methodHandle*, JavaCallArguments*, Thread*)+0x1056
>>>>>>> V  [libjvm.so+0x691171]  JavaCalls::call_virtual(JavaValue*,
>>>>>>> KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x321
>>>>>>> V  [libjvm.so+0x691617]  JavaCalls::call_virtual(JavaValue*, Handle,
>>>>>>> KlassHandle, Symbol*, Symbol*, Thread*)+0x47
>>>>>>> V  [libjvm.so+0x72c990]  thread_entry(JavaThread*, Thread*)+0xa0
>>>>>>> V  [libjvm.so+0xa755f3]  JavaThread::thread_main_inner()+0x103
>>>>>>> V  [libjvm.so+0xa7573c]  JavaThread::run()+0x11c
>>>>>>> V  [libjvm.so+0x926138]  java_start(Thread*)+0x108
>>>>>>> C  [libpthread.so.0+0x7e25]  start_thread+0xc5
>>>>>>>
>>>>>>> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
>>>>>>> j  java.util.zip.Inflater.end(J)V+0
>>>>>>> j  java.util.zip.Inflater.end()V+29
>>>>>>> j  java.util.zip.ZipFile.close()V+169
>>>>>>> j  sun.net.www.protocol.jar.URLJarFile.close()V+18
>>>>>>> j  sun.net.www.protocol.jar.URLJarFile.finalize()V+1
>>>>>>> J 9535% C2 java.lang.ref.Finalizer$FinalizerThread.run()V (55 bytes)
>>>>>>> @ 0x00007f293674cec0 [0x00007f293674cc00+0x2c0]
>>>>>>> v  ~StubRoutines::call_stub
>>>>>>>
>>>>>>

Reply via email to