Hi Tim, I'm not aware of any memory-related issues being related to the deployment mode used. Have you checked the logs for hints? Additionally, you could try to extract a heap dump. That might help you in analyzing the cause of the memory consumption.
The TaskManager and JobManager are logging the effective memory-related configuration during startup. You can look out for the "Preconfiguration" section in each of the log files to get a drill-down of how much memory is used per memory pool. Best, Matthias On Tue, Nov 10, 2020 at 3:37 PM Tim Eckhardt <tim.eckha...@uniberg.com> wrote: > Hi there, > > > > I have a problem with running a flink job in job cluster mode using flink > 1.11.1 (also tried 1.11.2). > > The same job is running well using the session cluster mode as well as > using flink 1.10.0 in job cluster mode. > > > > The job starts running and is running for quite some time but it runs a > lot slower than in session cluster mode and crashes after running for about > an hour. I can observe in the flink dashboard that the JVM heap is constant > at a high level and is getting slowly closer to the limit (4.13GB in my > case) which it reaches close to the job crashing. > > There is also some G1_Old_Generation garbage collection going on which I > cannot observe in session mode as well. > > > > GC values after running for about 45min: > > > > (Collector, Count, Time) > > *G1_Young_Generation *1,250 107,937 > > *G1_Old_Generation *322 2,432,362 > > > > Compared to the GC values of the same job in session cluster mode (after > the same runtime): > > > > *G1_Young_Generation *1,920 20,575 > > *G1_Old_Generation *0 0 > > > > So my vague guess is that it has to be something memory related maybe > configuration wise. > > > > To simplify the setup only one jobmanager and one taskmanager is used. The > taskmanager has a memory setting of: taskmanager.memory.process.size: > 10000m which should be totally fine for the server. The jobmanager has a > defined heap_size of 1600m. > > > > Maybe somebody has experienced something like this before? > > > > Also is there a way to export the currently loaded configuration > parameters of the job- and taskmanagers in a cluster? For example I can’t > see the current memory process size of the taskmanager in the flink > dashboard. Because this way I could compare the running and crashing setups > more easily (using docker and environment variables for configuration at > the moment which makes it a bit harder to debug). > > > > Thanks. >