On Tue 12-02-13 14:53:58, Johannes Weiner wrote:
[...]
> iteration:
> rcu_read_lock()
> dead_count = atomic_read(&hierarchy->dead_count)
> smp_rmb()
> previous = iterator->position
> if (iterator->dead_count != dead_count)
>    /* A cgroup in our hierarchy was killed, pointer might be dangling */
>    don't use iterator
> if (!tryget(&previous))
>    /* The cgroup is marked dead, don't use it */
>    don't use iterator
> next = find_next_and_tryget(hierarchy, &previous)
> /* what happens if destruction of next starts NOW? */

OK, I thought that this depends on the ordering of CSS_DEACT_BIAS and
dead_count writes - because there is no memory ordering enforced between
those two. But it shouldn't matter because we are checking both. If the
increment is seen sooner then we do not care about css_tryget and if css
is deactivated before dead_count++ then the css_tryget would shout.

More interesting ordering, however, is dead_count++ vs. css_put from
cgroup core. Say we have the following:

        CPU0                    CPU1                    CPU2

iter->position = A;
iter->dead_count = dead_count;
rcu_read_unlock()
return A

mem_cgroup_iter_break
  css_put(A)                                    bias(A)
                                                css_offline()
                                                css_put(A) // in 
cgroup_destroy_locked
                                                           // last ref and A 
will be freed
                        rcu_read_lock()
                        read parent->dead_count
                                                parent->dead_count++ // got 
reordered from css_offline
                        css_tryget(A) // kaboom

The reordering window is really huge and I think it is impossible
to trigger in real life. And mem_cgroup_reparent_charges calls
mem_cgroup_start_move unconditionally which in turn calls
synchronize_rcu() which is a full barrier AFAIU so dead_count++ cannot
be reordered ATM.
But should we rely on that? Shouldn't we add smp_wmb
after dead_count++ as I had in an earlier version of the patch?

> css_put(previous)
> iterator->position = next
> smp_wmb()
> iterator->dead_count = dead_count /* my suggestion, instead of a second 
> atomic_read() */
> rcu_read_unlock()
> return next /* caller drops ref eventually, iterator->cgroup becomes weak */
> 
> destruction:
> bias(cgroup->refcount) /* disables future tryget */
> //synchronize_rcu() /* Michal's suggestion */
> atomic_inc(&cgroup->hierarchy->dead_count)
> synchronize_rcu()
> free(cgroup)

Other than that this should work. I will update the patch accordingly.

Thanks!
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to