Re: [ceph-users] OSD killed by OOM when many cache available

2017-11-17 Thread Eric Nelson
As I understand it the memory would show up in /proc/meminfo or vmstat as cached, however since it's being used for the page cache the kernel cannot allocate it at the time of oom-killer invocation since it's paged out (likely for cached reads/writes to your storage device). You could call # sync ;

Re: [ceph-users] OSD killed by OOM when many cache available

2017-11-17 Thread Sam Huracan
@Eric: How can I check status of fscache? Why can it be root cause? Thanks 2017-11-18 7:30 GMT+07:00 Eric Nelson : > One thing that doesn't show up is fs cache, which is likely the cause > here. We went through this on our SSDs and had to add the following to stop > the crashes. I believe vm.vfs

Re: [ceph-users] OSD killed by OOM when many cache available

2017-11-17 Thread Eric Nelson
One thing that doesn't show up is fs cache, which is likely the cause here. We went through this on our SSDs and had to add the following to stop the crashes. I believe vm.vfs_cache_pressure and min_free_kbytes were the really helpful things in getting the crashes to stop. HTH! sysctl_param 'vm.vf

Re: [ceph-users] OSD killed by OOM when many cache available

2017-11-17 Thread Sam Huracan
I see some more logs about memory in syslog: Nov 17 10:47:17 ceph1 kernel: [2810698.553749] Node 0 DMA free:14828kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15996kB managed:15896kB

[ceph-users] OSD killed by OOM when many cache available

2017-11-17 Thread Sam Huracan
Today, one of our Ceph OSDs was down, I've check syslog and see this OSD process was killed by OMM Nov 17 10:01:06 ceph1 kernel: [2807926.762304] Out of memory: Kill process 3330 (ceph-osd) score 7 or sacrifice child Nov 17 10:01:06 ceph1 kernel: [2807926.763745] Killed process 3330 (ceph-osd) to