On Tue, Apr 14, 2015 at 12:25:06PM +0200, Ingo Molnar wrote: > > * Paul E. McKenney <paul...@linux.vnet.ibm.com> wrote: > > > On Mon, Apr 13, 2015 at 11:21:46AM -0700, Linus Torvalds wrote: > > > On Mon, Apr 13, 2015 at 10:43 AM, Paul E. McKenney > > > <paul...@linux.vnet.ibm.com> wrote: > > > > > > > > A shorthand for READ_ONCE + smp_read_barrier_depends() is the shiny > > > > new lockless_dereference() > > > > > > Related side note - I think people should get used to seeing > > > "smp_load_acquire()". It has well-defined memory ordering properties > > > and should generally perform well on most architectures. It's (much) > > > stronger than lockless_dereference(), and together with > > > smp_store_release() you can make rather clear guarantees about passing > > > data locklessly from one CPU to another. > > > > > > I'd like to see us use more of the pattern of > > > > > > - one thread does: > > > > > > .. allocate/create some data > > > smp_store_release() to "expose it" > > > > > > - another thread does: > > > > > > smp_load_acquire() to read index/pointer/flag/whatever > > > .. use the data any damn way you want .. > > > > > > and we should probably aim to prefer that pattern over a lot of our > > > traditional memory barriers. > > > > I couldn't agree more! > > /me too! > > > RCU made a similar move from open-coding smp_read_barrier_depends() > > to using rcu_dereference() many years ago, and that change made RCU > > code -much- easier to read and understand. I believe that moving > > from smp_mb(), smp_rmb(), and smp_wmb() to smp_store_release() and > > smp_load_acquire() will provide similar maintainability benefits. > > Furthermore, when the current code uses smp_mb(), > > smp_store_release() and smp_load_acquire() generate faster code on > > most architectures. > > A similar maintainability argument can be made for locking: > spin_lock(x) was a big step forward compared to lock_kernel(), > primarily not because it improves scalability (it often does), but > because the '(x)' very clearly documents the data structure that is > being accessed and makes locking and data access bugs a lot more > visible in the review phase already. > > I wish rcu_read_lock() had a data argument, for similar reasons - even > if it just pointed to a pre-existing lock or an rcu head it never > touches ;-)
Heh! Jack Slingwine and I had that argument back in 1993. I advocated placing the update-side lock into the rcu_read_lock() equivalent, and he responded by showing me a use cases were (1) there were no update-side locks and (2) there were many update-side locks, and it was impossible to select just one on the read side. ;-) However, DYNIX/ptx did not have anything like rcu_dereference() or list_for_each_entry_rcu(), which perhaps can be used in your example below. (Hey, that was 20 years ago, when 50MB was a lot of main memory. So we relied on compilers being quite dumb.) > As an example I picked a random file out of the kernel that uses RCU: > kernel/cpuset.c::validate_change(): > > static int validate_change(struct cpuset *cur, struct cpuset *trial) > { > struct cgroup_subsys_state *css; > struct cpuset *c, *par; > int ret; > > rcu_read_lock(); > > /* Each of our child cpusets must be a subset of us */ > ret = -EBUSY; > cpuset_for_each_child(c, css, cur) > if (!is_cpuset_subset(c, trial)) > goto out; > > /* Remaining checks don't apply to root cpuset */ > ret = 0; > if (cur == &top_cpuset) > goto out; > > par = parent_cs(cur); > > /* On legacy hiearchy, we must be a subset of our parent cpuset. */ > ret = -EACCES; > if (!cgroup_on_dfl(cur->css.cgroup) && !is_cpuset_subset(trial, par)) > goto out; > > /* > * If either I or some sibling (!= me) is exclusive, we can't > * overlap > */ > ret = -EINVAL; > cpuset_for_each_child(c, css, par) { > if ((is_cpu_exclusive(trial) || is_cpu_exclusive(c)) && > c != cur && > cpumask_intersects(trial->cpus_allowed, c->cpus_allowed)) > goto out; > if ((is_mem_exclusive(trial) || is_mem_exclusive(c)) && > c != cur && > nodes_intersects(trial->mems_allowed, c->mems_allowed)) > goto out; > } > > /* > * Cpusets with tasks - existing or newly being attached - can't > * be changed to have empty cpus_allowed or mems_allowed. > */ > ret = -ENOSPC; > if ((cgroup_has_tasks(cur->css.cgroup) || cur->attach_in_progress)) { > if (!cpumask_empty(cur->cpus_allowed) && > cpumask_empty(trial->cpus_allowed)) > goto out; > if (!nodes_empty(cur->mems_allowed) && > nodes_empty(trial->mems_allowed)) > goto out; > } > > /* > * We can't shrink if we won't have enough room for SCHED_DEADLINE > * tasks. > */ > ret = -EBUSY; > if (is_cpu_exclusive(cur) && > !cpuset_cpumask_can_shrink(cur->cpus_allowed, > trial->cpus_allowed)) > goto out; > > ret = 0; > out: > rcu_read_unlock(); > return ret; > } > > So just from taking a glance at that function can you tell me what is > being RCU protected here? I cannot, I can only guess that it must > either be cpuset_for_each_child() or maybe the cpumasks or other > internals. > > And if I search the file for call_rcu() it shows me nothing. Only if I > know that cpusets are integrated with cgroups and I search > kernel/cgroup.c for call_rcu(), do I find: > > call_rcu(&css->rcu_head, css_free_rcu_fn); > > aha! > > ... or if I drill down 3 levels into cpuset_for_each_child() -> > css_for_each_child() -> css_next_child() do I see the RCU iteration. And I have felt that reviewing pain as well. But shouldn't these API members be tagged with "_rcu" to make that more clear? Sort of like the difference between list_for_each_entry and list_for_each_entry_rcu()? > It would have been a lot clearer from the onset, if I had a hint > syntactically: > > rcu_read_lock(&css->rcu_head); > ... > rcu_read_unlock(&css->rcu_head); I cannot resist asking what you put there if the update side uses synchronize_rcu()... A NULL pointer? A pointer to synchronize_rcu()? Something else? And what do you do in the not-uncommon case where multiple RCU chains are being traversed in the same RCU read-side critical section? One approach would be to use varargs, I suppose. Though with a hash table, list, or tree, you could have a -lot- of ->rcu_head structures to reference, and concurrent additions and deletions mean that you wouldn't necessarily know which at rcu_read_lock() time. > beyond the reviewer bonus I bet this would allow some extra debugging > as well (only enabled in debug kernels): > > - for example to make sure we only access a field if _that field_ is > RCU locked (reducing the chance of having the right locking for > the wrong reason) One possibility would be to mark each traversal of an RCU-protected pointer. Currently, if a multilinked structure is inserted in one shot, only the initial pointer to that structure needs to have rcu_dereference(). Otherwise, it is hard to tell exactly how far the RCU protection is to extend. (Been having too much fun with this sort of thing in the standards committees...) > - we could possibly also build lockdep dependencies out of such > annotated RCU locking patterns. Tell me more? > - RCU aware list walking primitives could auto-check that this > particular list is properly RCU locked. For example, that a lock in the proper update class was held during the corresponding update? > This could be introduced gradually by using a different API name: > > rcu_lock(&css->rcu_head); > ... > rcu_unlock(&css->rcu_head); > > (the 'read' is implied in RCU locking anyway.) Agreed, a new API would be needed for something like this. > ... and if you think this approach has any merit, I volunteer the perf > and sched subsystems as guinea pigs! :-) And rcutorture, not that it counts for much. > What do you think? I think that I don't yet fully understand your proposal. ;-) Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/