On Thu, 28 May 2020 at 20:33, Michal Hocko <mho...@kernel.org> wrote: > > On Fri 22-05-20 02:23:09, Naresh Kamboju wrote: > > My apology ! > > As per the test results history this problem started happening from > > Bad : next-20200430 (still reproducible on next-20200519) > > Good : next-20200429 > > > > The git tree / tag used for testing is from linux next-20200430 tag and > > reverted > > following three patches and oom-killer problem fixed. > > > > Revert "mm, memcg: avoid stale protection values when cgroup is above > > protection" > > Revert "mm, memcg: decouple e{low,min} state mutations from protectinn > > checks" > > Revert > > "mm-memcg-decouple-elowmin-state-mutations-from-protection-checks-fix" > > The discussion has fragmented and I got lost TBH. > In > http://lkml.kernel.org/r/ca+g9fyudwgzx50upd+wcsdehx9vi3hpksvbawbmgrzadb0p...@mail.gmail.com > you have said that none of the added tracing output has triggered. Does > this still hold? Because I still have a hard time to understand how > those three patches could have the observed effects.
On the other email thread [1] this issue is concluded. Yafang wrote on May 22 2020, Regarding the root cause, my guess is it makes a similar mistake that I tried to fix in the previous patch that the direct reclaimer read a stale protection value. But I don't think it is worth to add another fix. The best way is to revert this commit. [1] [PATCH v3 2/2] mm, memcg: Decouple e{low,min} state mutations from protection checks https://lore.kernel.org/linux-mm/caloahbarz3nsur3mcnx_kbsf8ktpjhuf2kaata7mb7ocjaj...@mail.gmail.com/ - Naresh > -- > Michal Hocko > SUSE Labs