On Wed 11-01-17 20:32:12, David Rientjes wrote:
> When memory.move_charge_at_immigrate is enabled and precharges are
> depleted during move, mem_cgroup_move_charge_pte_range() will attempt to
> increase the size of the precharge.
> 
> This livelocks if reclaim fails and if an oom killed process attached to
> the destination memcg is trying to exit, which requires 
> cgroup_threadgroup_rwsem, since we're holding the mutex (we also livelock
> while holding mm->mmap_sem for read).

Is this really the case? try_charge will return with ENOMEM for
GFP_KERNEL requests and mem_cgroup_do_precharge will bail out. So how
exactly do we livelock? We do not depend on the exiting task to make a
forward progress. Or am I missing something?

> Prevent precharges from ever looping by setting __GFP_NORETRY.  This was
> probably the intention of the GFP_KERNEL & ~__GFP_NORETRY, which is
> pointless as written.

Yes the current code is clearly bogus, I really do not remember why we
ended up with this rather than GFP_KERNEL | __GFP_NORETRY.
 
> This also restructures mem_cgroup_wait_acct_move() since it is not
> possible for mc.moving_task to be current.

Please separate this out to its own patch.

> Fixes: 0029e19ebf84 ("mm: memcontrol: remove explicit OOM parameter in charge 
> path")
> Signed-off-by: David Rientjes <rient...@google.com>

For the mem_cgroup_do_precharge part
Acked-by: Michal Hocko <mho...@suse.com>

> ---
>  mm/memcontrol.c | 32 +++++++++++++++++++-------------
>  1 file changed, 19 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1125,18 +1125,19 @@ static bool mem_cgroup_under_move(struct mem_cgroup 
> *memcg)
>  
>  static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg)
>  {
> -     if (mc.moving_task && current != mc.moving_task) {
> -             if (mem_cgroup_under_move(memcg)) {
> -                     DEFINE_WAIT(wait);
> -                     prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE);
> -                     /* moving charge context might have finished. */
> -                     if (mc.moving_task)
> -                             schedule();
> -                     finish_wait(&mc.waitq, &wait);
> -                     return true;
> -             }
> +     DEFINE_WAIT(wait);
> +
> +     if (likely(!mem_cgroup_under_move(memcg)))
> +             return false;
> +
> +     prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE);
> +     /* moving charge context might have finished. */
> +     if (mc.moving_task) {
> +             WARN_ON_ONCE(mc.moving_task == current);
> +             schedule();
>       }
> -     return false;
> +     finish_wait(&mc.waitq, &wait);
> +     return true;
>  }
>  
>  #define K(x) ((x) << (PAGE_SHIFT-10))
> @@ -4355,9 +4356,14 @@ static int mem_cgroup_do_precharge(unsigned long count)
>               return ret;
>       }
>  
> -     /* Try charges one by one with reclaim */
> +     /*
> +      * Try charges one by one with reclaim, but do not retry.  This avoids
> +      * looping forever when try_charge() cannot reclaim memory and the oom
> +      * killer defers while waiting for a process to exit which is trying to
> +      * acquire cgroup_threadgroup_rwsem in the exit path.
> +      */
>       while (count--) {
> -             ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_NORETRY, 1);
> +             ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1);
>               if (ret)
>                       return ret;
>               mc.precharge++;

-- 
Michal Hocko
SUSE Labs

Reply via email to