I have seemingly solved this issue with linux-aws version 4.4.0-1016-aws
at the very least.  The specific issue I was seeing was 2nd order
allocations failing when OOMKiller triggered.  At the time I was
thinking the issue was due to XFS and memory fragmentation with lots and
lots of memory mapped files in Elasticsearch/Lucene.  When we moved to
EXT4 the rate of oomkiller firing dropped, but did not stop.  We made
the following 2 changes to sysctls which have effectively stopped higher
order memory allocaitons from failing and oomkiller firing.

Also these settings were used on i3.2xlarge hosts that have 60G of ram -
your milage may vary.  Also we do not run swap on our servers, so likely
adding swap could have helped, but not an option for us.

vm.min_free_kbytes = 1000000 # We set this to leave about 1G of ram
available for the kernel in the hope that even if the memory was heavily
fragmented there might still be enough memory for linux to grab a higher
order memory allocation fast enough before oomkiller does things.

vm.zone_reclaim_mode = 1 # our hope here was to get the kernel to get
more aggressive in reclaiming memory

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1655842

Title:
  "Out of memory" errors after upgrade to 4.4.0-59

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1655842/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to