;>> In our internal tests, we also encounter these two issues and we
>>>>>>> spent much time debugging them. There are two points I need to confirm
>>>>>>> if
>>>>>>> we share the same problem.
>>>>>>>
>
>>>1. Your job is using default restart strategy, which is
>>>>>>per-second restart.
>>>>>>2. Your CPU resource on jobmanager might be small
>>>>>>
>>>>>>
>>>>>>
>>>>>> Her
gt;>>>>restart.
>>>>>2. Your CPU resource on jobmanager might be small
>>>>>
>>>>>
>>>>>
>>>>> Here is some findings I want to share.
>>>>>
>>>>> ## Metaspace OOM
>>&
https://issues.apache.org/jira/browse/FLINK-15467 , when we
>>>> have some job restarts, there will be some threads from the sourceFunction
>>>> hanging, cause the class loader cannot close. New restarts would load new
>>>> classes, then expand the metaspace, and fi
anging, cause the class loader cannot close. New restarts would load new
>>> classes, then expand the metaspace, and finally OOM happens.
>>>
>>>
>>>
>>> ## Leader retrieving
>>>
>>> Constant restarts may be heavy for jobmanager, if JM CPU
starts would load new
>> classes, then expand the metaspace, and finally OOM happens.
>>
>>
>>
>> ## Leader retrieving
>>
>> Constant restarts may be heavy for jobmanager, if JM CPU resources are
>> not enough, the thread for leader retrieving may be stuck.
leader retrieving may be stuck.
>
>
>
> Best Regards,
>
> Brian
>
>
>
> *From:* Xintong Song
> *Sent:* Tuesday, September 22, 2020 10:16
> *To:* Claude M; user
> *Subject:* Re: metaspace out-of-memory & error while retrieving the
> leader gateway
>
&g
To: Claude M; user
Subject: Re: metaspace out-of-memory & error while retrieving the leader gateway
## Metaspace OOM
As the error message already suggested, the metaspace OOM you encountered is
likely caused by a class loading leak. I think you are on the right direction
trying to look into
## Metaspace OOM
As the error message already suggested, the metaspace OOM you encountered
is likely caused by a class loading leak. I think you are on the right
direction trying to look into the heap dump and find out where the leak
comes from. IIUC, after removing the ZK folder, you are now able
Hi Claude,
IIUC, in your case the leader retrieving problem is triggered by adding the
`java.opts`? Then could you try to find and post the complete command for
launching the JVM process? You can try log into the pod and execute `ps -ef
| grep `.
A few more questions:
- What do you mean by "resol
Hello,
I upgraded from Flink 1.7.2 to 1.10.2. One of the jobs running on the task
managers is periodically crashing w/ the following error:
java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error
has occurred. This can mean two things: either the job requires a larger
size of JV
11 matches
Mail list logo