On Fri 28-07-17 21:59:50, Tetsuo Handa wrote:
> (Oops. Forgot to add CC.)
> 
> On 2017/07/28 21:32, Michal Hocko wrote:
> > [CC linux-mm]
> >
> > On Fri 28-07-17 17:22:25, Manish Jaggi wrote:
> >> was: Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap
> >>
> >> Hi Michal,
> >> On 7/27/2017 2:54 PM, Michal Hocko wrote:
> >>> On Thu 27-07-17 13:59:09, Manish Jaggi wrote:
> >>> [...]
> >>>> With 4.11.6 I was getting random kernel panics (Out of memory - No 
> >>>> process left to kill),
> >>>>  when running LTP oom01 /oom02 ltp tests on our arm64 hardware with 
> >>>> ~256G memory and high core count.
> >>>> The issue experienced was as follows
> >>>>  that either test (oom01/oom02) selected a pid as victim and waited for 
> >>>> the pid to be killed.
> >>>>  that pid was marked as killed but somewhere there is a race and the 
> >>>> process didnt get killed.
> >>>>  and the oom01/oom02 test started killing further processes, till it 
> >>>> panics.
> >>>> IIUC this issue is quite similar to your patch description. But applying 
> >>>> your patch I still see the issue.
> >>>> If it is not related to this patch, can you please suggest by looking at 
> >>>> the log, what could be preventing
> >>>> the killing of victim.
> >>>>
> >>>> Log (https://pastebin.com/hg5iXRj2)
> >>>>
> >>>> As a subtest of oom02 starts, it prints out the victim - In this case 
> >>>> 4578
> >>>>
> >>>> oom02       0  TINFO  :  start OOM testing for mlocked pages.
> >>>> oom02       0  TINFO  :  expected victim is 4578.
> >>>>
> >>>> When oom02 thread invokes oom-killer, it did select 4578  for killing...
> >>> I will definitely have a look. Can you report it in a separate email
> >>> thread please? Are you able to reproduce with the current Linus or
> >>> linux-next trees?
> >> Yes this issue is visible with linux-next.
> >
> > Could you provide the full kernel log from this run please? I do not
> > expect there to be much difference but just to be sure that the code I
> > am looking at matches logs.
> 
> 4578 is consuming memory as mlocked pages. But the OOM reaper cannot reclaim
> mlocked pages (i.e. can_madv_dontneed_vma() returns false due to VM_LOCKED), 
> can it?

You are absolutely right. I am pretty sure I've checked mlocked counter
as the first thing but that must be from one of the earlier oom reports.
My fault I haven't checked it in the critical one

[  365.267347] oom_reaper: reaped process 4578 (oom02), now 
anon-rss:131559616kB, file-rss:0kB, shmem-rss:0kB
[  365.282658] oom_reaper: reaped process 4583 (oom02), now 
anon-rss:131561664kB, file-rss:0kB, shmem-rss:0kB

and the above screemed about the fact I was just completely blind.

mlock pages handling is on my todo list for quite some time already but
I didn't get around it to implement that. mlock code is very tricky.
-- 
Michal Hocko
SUSE Labs

Reply via email to