Hi,
I've a Centos 6 server with custom kernel 3.3.6 compiled (config
from centos) that contains about 40 lxc.
I have compiled the kernel to resolve (unsuccessfully) an OOM issue. With:
lxc.cgroup.memory.limit_in_bytes = 500M
lxc.cgroup.memory.memsw.limit_in_bytes = 500M
lxc.cgroup.memory.oom_control = 0
when the memory rises above the limit the OOM-Killer sometimes (often)
kill processes outside the container that triggered the limit.
To bypass that issue I have configured the containers like the following:
lxc.utsname = test_oom
lxc.tty = 1
lxc.pts = 1024
lxc.rootfs = /lxc/containers/test_oom
lxc.mount = /conf/lxc/test_oom/fstab
#networking
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = br0
lxc.network.name = eth0
lxc.network.mtu = 1500
lxc.network.ipv4 = X.X.X.X/27
lxc.network.hwaddr = xx:xx:xx:xx:xx:xx
lxc.network.veth.pair = veth-xxx
#cgroups
lxc.cgroup.devices.deny = a
# /dev/null and zero
lxc.cgroup.devices.allow = c 1:3 rwm
lxc.cgroup.devices.allow = c 1:5 rwm
# consoles
lxc.cgroup.devices.allow = c 5:1 rwm
# tty
lxc.cgroup.devices.allow = c 5:0 rwm
lxc.cgroup.devices.allow = c 4:0 rwm
lxc.cgroup.devices.allow = c 4:1 rwm
# /dev/{,u}random
lxc.cgroup.devices.allow = c 1:9 rwm
lxc.cgroup.devices.allow = c 1:8 rwm
lxc.cgroup.devices.allow = c 136:* rwm
lxc.cgroup.devices.allow = c 5:2 rwm
# rtc
lxc.cgroup.devices.allow = c 254:0 rwm
# cpu
lxc.cgroup.cpuset.cpus = 3
#mem
lxc.cgroup.memory.limit_in_bytes = 500M
lxc.cgroup.memory.memsw.limit_in_bytes = 500M
*lxc.cgroup.memory.oom_control = 1*
#capabilities
lxc.cap.drop = sys_module mac_override mac_admin
and I've created mine OOM-killer, using eventfd, cgroup.event_control,
cgroup.procs, memory.oom_control, etc..
That program works great, killing right processes. I've also set up
the maximum possible value for this sysctl parameter (
http://www.linuxinsight.com/proc_sys_vm_vfs_cache_pressure.html) to bypass
a SLAB cache problem.
But the issue with OOM continue, for example some minutes ago OOM-Killer
has been executed, writing in syslog:
kernel: Out of memory: Kill process 19981 (httpd) score 12 or
sacrifice child
kernel: Killed process 20859 (httpd) total-vm:1022216kB,
anon-rss:416736kB, file-rss:124kB
kernel: httpd invoked oom-killer: gfp_mask=0x0, order=0,
oom_adj=0, oom_score_adj=0
kernel: httpd cpuset=<container> mems_allowed=0
kernel: Pid: 19987, comm: httpd Not tainted 3.3.6 #4
kernel: Call Trace:
kernel: [<ffffffff8110f07b>] dump_header+0x8b/0x1e0
kernel: [<ffffffff8110eb4f>] ? find_lock_task_mm+0x2f/0x80
kernel: [<ffffffff811f93c5>] ? security_capable_noaudit+0x15/0x20
kernel: [<ffffffff8110f8b5>] oom_kill_process+0x85/0x170
kernel: [<ffffffff8110fa9f>] out_of_memory+0xff/0x210
kernel: [<ffffffff8110fc75>] pagefault_out_of_memory+0xc5/0x110
kernel: [<ffffffff81041cfc>] mm_fault_error+0xbc/0x1b0
kernel: [<ffffffff81506873>] do_page_fault+0x3c3/0x460
kernel: [<ffffffff81171253>] ? sys_newfstat+0x33/0x40
kernel: [<ffffffff81503075>] page_fault+0x25/0x30
...
and killing a process outside the container that has invoked it. But I've
disable OOM-killer completely for containers processes!
I think this is a bug in kernel code.
Each containers has own filesystem in a LV like this:
/dev/mapper/lxcbox--01.mmfg.it--vg-test_oom on
/lxc/containers/test_oom type ext3 (rw)
and is a development server with init, rsyslog, mingetty, apache, ssh
and crontab.
Someone can help me to understand where is the problem with oom?
Thank you!
--
Davide Belloni
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel