Re: Job crash in job cluster mode

Matthias Pohl Tue, 10 Nov 2020 09:01:26 -0800

Hi Tim,
I'm not aware of any memory-related issues being related to the deployment
mode used. Have you checked the logs for hints? Additionally, you could try
to extract a heap dump. That might help you in analyzing the cause of the
memory consumption.


The TaskManager and JobManager are logging the effective memory-related
configuration during startup. You can look out for the "Preconfiguration"
section in each of the log files to get a drill-down of how much memory is
used per memory pool.

Best,
Matthias

On Tue, Nov 10, 2020 at 3:37 PM Tim Eckhardt <[email protected]>
wrote:

> Hi there,
>
>
>
> I have a problem with running a flink job in job cluster mode using flink
> 1.11.1 (also tried 1.11.2).
>
> The same job is running well using the session cluster mode as well as
> using flink 1.10.0 in job cluster mode.
>
>
>
> The job starts running and is running for quite some time but it runs a
> lot slower than in session cluster mode and crashes after running for about
> an hour. I can observe in the flink dashboard that the JVM heap is constant
> at a high level and is getting slowly closer to the limit (4.13GB in my
> case) which it reaches close to the job crashing.
>
> There is also some G1_Old_Generation garbage collection going on which I
> cannot observe in session mode as well.
>
>
>
> GC values after running for about 45min:
>
>
>
> (Collector, Count, Time)
>
> *G1_Young_Generation   *1,250  107,937
>
> *G1_Old_Generation  *322  2,432,362
>
>
>
> Compared to the GC values of the same job in session cluster mode (after
> the same runtime):
>
>
>
> *G1_Young_Generation   *1,920  20,575
>
> *G1_Old_Generation  *0  0
>
>
>
> So my vague guess is that it has to be something memory related maybe
> configuration wise.
>
>
>
> To simplify the setup only one jobmanager and one taskmanager is used. The
> taskmanager has a memory setting of: taskmanager.memory.process.size:
> 10000m which should be totally fine for the server. The jobmanager has a
> defined heap_size of 1600m.
>
>
>
> Maybe somebody has experienced something like this before?
>
>
>
> Also is there a way to export the currently loaded configuration
> parameters of the job- and taskmanagers in a cluster? For example I can’t
> see the current memory process size of the taskmanager in the flink
> dashboard. Because this way I could compare the running and crashing setups
> more easily (using docker and environment variables for configuration at
> the moment which makes it a bit harder to debug).
>
>
>
> Thanks.
>

Re: Job crash in job cluster mode

Reply via email to