Alright. Thanks a lot for that information! 2017-02-06 14:35 GMT+01:00 Avi Kivity <a...@scylladb.com>:
> It is a bug. In some contexts, the kernel needs to be able to reclaim > memory instantly, but this is not one of them. Here, the java process is > creating a new thread, and the kernel is allocating 16kB for its kernel > stack; that is a regular allocation, not atomic. If you decide the gfp_mask > value you'll see that the kernel is allowed to initiate I/O and perform > filesystem operations to satisfy the allocation, which it apparently did > not. > > > I do recommend reporting it, it will help others avoid encountering the > same problem if it gets fixed. > > On 02/06/2017 03:07 PM, Benjamin Roth wrote: > > Thanks for the reply. We got rid of the OOMs by increasing > vm.min_free_kbytes, it's default of approx 90mb is maybe a bit low for > systems with 128GB. > I guess the OOM happens because the kernel could not reclaim enough paged > memory instantly. > I can't tell if this is really a kernel bug or not. It also was my first > thought but in the end the main thing is, it works again and it does with > more mibn_free_kbytes > > 2017-02-06 11:53 GMT+01:00 Avi Kivity <a...@scylladb.com>: > >> >> On 01/26/2017 07:36 AM, Benjamin Roth wrote: >> >> Hi there, >> >> We installed 2 new nodes these days. They run on ubuntu (Ubuntu 16.04.1 >> LTS) with kernel 4.4.0-59-generic. On these nodes (and only on these) CS >> gets killed by the kernel due to OOM. It seems very strange to me because, >> CS only takes roughly 20GB (out of 128GB), most of RAM is allocated to page >> cache. >> >> Top looks typically like this: >> KiB Mem : 13191691+total, 1974964 free, 20278184 used, >> 10966376+buff/cache >> KiB Swap: 0 total, 0 free, 0 used. 11051503+avail Mem >> >> This is what kern.log says: >> https://gist.github.com/brstgt/0f1aa6afb558a56d1cadce958db46cf9 >> >> Has anyone encountered sth like this before? >> >> >> 2017-01-26T03:10:45.679458+00:00 cas10 kernel: [52226.449989] Node 0 >> Normal: 33850*4kB (UMEH) 8*8kB (UMH) 1*16kB (H) 0*32kB 0*64kB 0*128kB >> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 135480kB >> 2017-01-26T03:10:45.679460+00:00 cas10 kernel: [52226.449995] Node 1 >> Normal: 34213*4kB (UME) 176*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB >> 0*512kB 0*1024kB 0*2048kB 0*4096kB = 138260kB >> >> >> There is plenty of free memory left (33850+34213)*4kB = 270 MB, but it is >> fragmented into 4k and 8k blocks, while the kernel is trying to allocate >> 16kB. Still, the kernel could have evicted some page cache or swapped out >> anonymous memory. You should report this to lkml, it is a kernel bug. >> >> >> >> -- >> Benjamin Roth >> Prokurist >> >> Jaumo GmbH · www.jaumo.com >> Wehrstraße 46 · 73035 Göppingen · Germany >> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1 >> <07161%203048801> >> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer >> >> >> > > > -- > Benjamin Roth > Prokurist > > Jaumo GmbH · www.jaumo.com > Wehrstraße 46 · 73035 Göppingen · Germany > Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1 > <07161%203048801> > AG Ulm · HRB 731058 · Managing Director: Jens Kammerer > > > -- Benjamin Roth Prokurist Jaumo GmbH · www.jaumo.com Wehrstraße 46 · 73035 Göppingen · Germany Phone +49 7161 304880-6 · Fax +49 7161 304880-1 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer