Re: [PATCH 2/3] task: RCU protect tasks on the runqueue

Peter Zijlstra Wed, 25 Sep 2019 00:37:32 -0700

On Mon, Sep 09, 2019 at 07:22:15AM -0500, Eric W. Biederman wrote:

> Let me see if I can explain my confusion in terms of task_numa_compare.
> 
> The function task_numa_comare now does:
> 
>       rcu_read_lock();
>       cur = rcu_dereference(dst_rq->curr);
> 
> Then it proceeds to examine a few fields of cur.
> 
>       cur->flags;
>         cur->cpus_ptr;
>         cur->numa_group;
>         cur->numa_faults[];
>         cur->total_numa_faults;
>         cur->se.cfs_rq;
> 
> My concern here is the ordering between setting rq->curr and and setting
> those other fields.  If we read values of those fields that were not
> current when we set cur than I think task_numa_compare would give an
> incorrect answer.


That code is explicitly without serialization. One memory barrier isn't
going to make any difference what so ever.

Specifically, those numbers will change _after_ we make it current, not
before.

> From what I can see pick_next_task_fair does not mess with any of
> the fields that task_numa_compare uses.  So the existing memory barrier
> should be more than sufficient for that case.

There should not be, but even if there is, load-balancing in general
(very much including the NUMA balancing) is completely unserialized and
prone to reading stale data at all times.

This is on purpose; it minimizes the interference of load-balancing, and
if we make a 'wrong' choice, the next pass can fix it up.

> So I think I just need to write up a patch 4/3 that replaces the my
> "rcu_assign_pointer(rq->curr, next)" with "RCU_INIT_POINTER(rq->curr,
> next)".  And includes a great big comment to that effect.
> 
> 
> 
> Now maybe I am wrong.  Maybe we can afford to see data that was changed
> before the task became rq->curr in task_numa_compare.  But I don't think
> so.  I really think it is that full memory barrier just a bit earlier
> in __schedule that is keeping us safe.
> 
> Am I wrong?

Not in so far that we now both seem to agree on using
RCU_INIT_POINTER(); but we're still disagreeing on why.

My argument is that since we can already obtain any task_struct pointer
through find_task_by_vpid() and friends, any access to its individual
members must already have serialzation rules (or explicitly none, as for
load-balancing).

I'm viewing this as just another variant of find_task_by_vpid(), except
we index by CPU instead of PID. There can be changes to task_struct
between to invocations of find_task_by_vpid(), any user will have to
somehow deal with that. This is no different.

Take for instance p->cpus_ptr, it has very specific serialzation rules
(which that NUMA balancer thing blatantly ignores), as do a lot of other
task_struct fields.

The memory barrier in question isn't going to help with any of that.

Specifically, if we care about the consistency of the numbers changed by
pick_next_task(), we must acquire rq->lock (of course, once we do that,
we can immediately use rq->curr without further RCU use).

Re: [PATCH 2/3] task: RCU protect tasks on the runqueue

Reply via email to