On 2017-01-12 Michal Hocko wrote: > On Wed 11-01-17 16:52:32, Trevor Cordes wrote: > [...] > > I'm not sure how I can tell if my bug is because of memcgs so here > > is a full first oom example (attached). > > 4.7 kernel doesn't contain 71c799f4982d ("mm: add per-zone lru list > stat") so the OOM report will not tell us whether the Normal zone > doesn't age active lists, unfortunatelly.
I compiled the patch Mel provided into the stock F23 kernel 4.8.13-100.fc23.i686+PAE and it ran for 2 nights. It didn't oom the first night, but did the second night. So the bug persists even with that patch. However, it does *seem* a bit "better" since it took 2 nights (usually takes only one, but maybe 10% of the time it does take two) before oom'ing, *and* it allowed my reboot script to reboot it cleanly when it saw the oom (which happens only 25% of the time). I'm attaching the 4.8.13 oom message which should have the memcg info (71c799f4982d) you are asking for above? Hopefully? > You can easily check whether this is memcg related by trying to run > the same workload with cgroup_disable=memory kernel command line > parameter. This will put all the memcg specifics out of the way. I will try booting now into cgroup_disable=memory to see if that helps at all. I'll reply back in 48 hours, or when it oom's, whichever comes first. Also, should I bother trying the latest git HEAD to see if that solves anything? Thanks!
oom2
Description: Binary data