Re: Flink and swapping question

Nico Kruber Mon, 29 May 2017 06:37:35 -0700

FYI: taskmanager.sh sets this parameter but also states the following:

  # Long.MAX_VALUE in TB: This is an upper bound, much less direct memory will 
be used
  TM_MAX_OFFHEAP_SIZE="8388607T"



Nico

On Monday, 29 May 2017 15:19:47 CEST Aljoscha Krettek wrote:
> Hi Flavio,
> 
> Is this running on YARN or bare metal? Did you manage to find out where this
> insanely large parameter is coming from?
> 
> Best,
> Aljoscha
> 
> > On 25. May 2017, at 19:36, Flavio Pompermaier <pomperma...@okkam.it>
> > wrote:
> > 
> > Hi to all,
> > I think we found the root cause of all the problems. Looking ad dmesg
> > there was a "crazy" total-vm size associated to the OOM error, a LOT much
> > bigger than the TaskManager's available memory. In our case, the TM had a
> > max heap of 14 GB while the dmsg error was reporting a required amount of
> > memory in the order of 60 GB!
> > 
> > [ 5331.992539] Out of memory: Kill process 24221 (java) score 937 or
> > sacrifice child [ 5331.992619] Killed process 24221 (java)
> > total-vm:64800680kB, anon-rss:31387544kB, file-rss:6064kB, shmem-rss:0kB
> > 
> > That wasn't definitively possible usin an ordinary JVM (and our TM was
> > running without off-heap settings) so we've looked at the parameters used
> > to run the TM JVM and indeed there was a reall huge amount of memory
> > given to MaxDirectMemorySize. With my big surprise Flink runs a TM with
> > this parameter set to 8.388.607T..does it make any sense?? Is it
> > documented anywhere the importance of this parameter (and why it is used
> > in non off-heap mode as well)? Is it related to network buffers? It
> > should also be documented that this parameter should be added to the TM
> > heap when reserving memory to Flin (IMHO).
> > 
> > I hope that this painful sessions of Flink troubleshooting could be an
> > added value sooner or later..
> > 
> > Best,
> > Flavio
> > 
> > On Thu, May 25, 2017 at 10:21 AM, Flavio Pompermaier <pomperma...@okkam.it
> > <mailto:pomperma...@okkam.it>> wrote: I can confirm that after giving
> > less memory to the Flink TM the job was able to run successfully. After
> > almost 2 weeks of pain, we summarize here our experience with Fink in
> > virtualized environments (such as VMWare ESXi): Disable the
> > virtualization "feature" that transfer a VM from a (heavy loaded)
> > physical machine to another one (to balance the resource consumption)
> > Check dmesg when a TM dies without logging anything (usually it goes OOM
> > and the OS kills it but there you can find the log of this thing) CentOS
> > 7 on ESXi seems to start swapping VERY early (in my case I see the OS
> > starting swapping also if there are 12 out of 32 GB of free memory)!
> > We're still investigating how this behavior could be fixed: the problem
> > is that it's better not to disable swapping because otherwise VMWare
> > could start ballooning (that is definitely worse...).
> > 
> > I hope this tips could save someone else's day..
> > 
> > Best,
> > Flavio
> > 
> > On Wed, May 24, 2017 at 4:28 PM, Flavio Pompermaier <pomperma...@okkam.it
> > <mailto:pomperma...@okkam.it>> wrote: Hi Greg, you were right! After
> > typing dmsg I found "Out of memory: Kill process 13574 (java)". This is
> > really strange because the JVM of the TM is very calm.
> > Moreover, there are 7 GB of memory available (out of 32) but somehow the
> > OS decides to start swapping and, when it runs out of available swap
> > memory, the OS decides to kill the Flink TM :(
> > 
> > Any idea of what's going on here?
> > 
> > On Wed, May 24, 2017 at 2:32 PM, Flavio Pompermaier <pomperma...@okkam.it
> > <mailto:pomperma...@okkam.it>> wrote: Hi Greg,
> > I carefully monitored all TM memory with jstat -gcutil and there'no full
> > gc, only .> 
> > The initial situation on the dying TM is:
> >   S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT    
> >   GCT 0.00 100.00  33.57  88.74  98.42  97.17    159    2.508     1   
> >   0.255    2.763 0.00 100.00  90.14  88.80  98.67  97.17    197    2.617 
> >      1    0.255    2.873 0.00 100.00  27.00  88.82  98.75  97.17    234  
> >    2.730     1    0.255    2.986> 
> > After about 10 hours of processing is:
> >   0.00 100.00  21.74  83.66  98.52  96.94   5519   33.011     1    0.255  
> >   33.267 0.00 100.00  21.74  83.66  98.52  96.94   5519   33.011     1   
> >   0.255   33.267 0.00 100.00  21.74  83.66  98.52  96.94   5519   33.011 
> >      1    0.255   33.267> 
> > So I don't think thta OOM could be an option.
> > 
> > However, the cluster is running on ESXi vSphere VMs and we already
> > experienced unexpected crash of jobs because of ESXi moving a
> > heavy-loaded VM to another (less loaded) physical machine..I would't be
> > surprised if swapping is also handled somehow differently.. Looking at
> > Cloudera widgets I see that the crash is usually preceded by an intense
> > cpu_iowait period. I fear that Flink unsafe access to memory could be a
> > problem in those scenarios. Am I wrong?
> > 
> > Any insight or debugging technique is  greatly appreciated.
> > Best,
> > Flavio
> > 
> > 
> > On Wed, May 24, 2017 at 2:11 PM, Greg Hogan <c...@greghogan.com
> > <mailto:c...@greghogan.com>> wrote: Hi Flavio,
> > 
> > Flink handles interrupts so the only silent killer I am aware of is
> > Linux's OOM killer. Are you seeing such a message in dmesg?
> > 
> > Greg
> > 
> > On Wed, May 24, 2017 at 3:18 AM, Flavio Pompermaier <pomperma...@okkam.it
> > <mailto:pomperma...@okkam.it>> wrote: Hi to all,
> > I'd like to know whether memory swapping could cause a taskmanager crash.
> > In my cluster of virtual machines 'm seeing this strange behavior in my
> > Flink cluster: sometimes, if memory get swapped the taskmanager (on that
> > machine) dies unexpectedly without any log about the error.
> > 
> > Is that possible or not?
> > 
> > Best,
> > Flavio

signature.asc
Description: This is a digitally signed message part.

Re: Flink and swapping question

Reply via email to