Just saw Stefan's response, it is basically the same.

We either null out the field on deploy or archival. On deploy would be even
more memory friendly.

@Steven - can you open a JIRA ticket for this?

On Fri, Jun 29, 2018 at 9:08 PM, Stephan Ewen <se...@apache.org> wrote:

> The problem seems to be that the Executions that are kept for history
> (mainly metrics / web UI) still hold a reference to their TaskStateSnapshot.
>
> Upon archival, that field needs to be cleared for GC.
>
> This is quite clearly a bug...
>
> On Fri, Jun 29, 2018 at 11:29 AM, Stefan Richter <
> s.rich...@data-artisans.com> wrote:
>
>> Hi Steven,
>>
>> from your analysis, I would conclude the following problem.
>> ExecutionVertexes hold executions, which are bootstrapped with the state
>> (in form of the map of state handles) when the job is initialized from a
>> checkpoint/savepoint. It holds a reference on this state, even when the
>> task is already running. I would assume it is save to set the reference to
>> TaskStateSnapshot to null at the end of the deploy() method and can be
>> GC’ed. From the provided stats, I cannot say if maybe the JM is also
>> holding references to too many ExecutionVertexes, but that would be a
>> different story.
>>
>> Best,
>> Stefan
>>
>> Am 29.06.2018 um 01:29 schrieb Steven Wu <stevenz...@gmail.com>:
>>
>> First, some context about the job
>> * embarrassingly parallel: all operators are chained together
>> * parallelism is over 1,000
>> * stateless except for Kafka source operators. checkpoint size is 8.4 MB.
>> * set "state.backend.fs.memory-threshold" so that only jobmanager writes
>> to S3 to checkpoint
>> * internal checkpoint with 10 checkpoints retained in history
>>
>> We don't expect jobmanager to use much memory at all. But it seems that
>> this high memory footprint (or leak) happened occasionally, maybe under
>> certain conditions. Any hypothesis?
>>
>> Thanks,
>> Steven
>>
>>
>> 41,567 ExecutionVertex objects retained 9+ GB of memory
>> <image.png>
>>
>>
>> Expanded in one ExecutionVertex. it seems to storing the kafka offsets
>> for source operator
>> <image.png>
>>
>>
>>
>

Reply via email to