On Thu, 2014-02-27 at 17:02 -0800, Paul E. McKenney wrote: > On Thu, Feb 27, 2014 at 09:50:21AM -0800, Paul E. McKenney wrote: > > On Thu, Feb 27, 2014 at 04:37:33PM +0100, Torvald Riegel wrote: > > > xagsmtp2.20140227154925.3...@vmsdvm9.vnet.ibm.com > > > > > > On Mon, 2014-02-24 at 11:54 -0800, Linus Torvalds wrote: > > > > On Mon, Feb 24, 2014 at 10:53 AM, Paul E. McKenney > > > > <paul...@linux.vnet.ibm.com> wrote: > > > > > > > > > > Good points. How about the following replacements? > > > > > > > > > > 3. Adding or subtracting an integer to/from a chained pointer > > > > > results in another chained pointer in that same pointer chain. > > > > > The results of addition and subtraction operations that cancel > > > > > the chained pointer's value (for example, "p-(long)p" where > > > > > "p" > > > > > is a pointer to char) are implementation defined. > > > > > > > > > > 4. Bitwise operators ("&", "|", "^", and I suppose also "~") > > > > > applied to a chained pointer and an integer for the purposes > > > > > of alignment and pointer translation results in another > > > > > chained pointer in that same pointer chain. Other uses > > > > > of bitwise operators on chained pointers (for example, > > > > > "p|~0") are implementation defined. > > > > > > > > Quite frankly, I think all of this language that is about the actual > > > > operations is irrelevant and wrong. > > > > > > > > It's not going to help compiler writers, and it sure isn't going to > > > > help users that read this. > > > > > > > > Why not just talk about "value chains" and that any operations that > > > > restrict the value range severely end up breaking the chain. There is > > > > no point in listing the operations individually, because every single > > > > operation *can* restrict things. Listing individual operations and > > > > depdendencies is just fundamentally wrong. > > > > > > [...] > > > > > > > The *only* thing that matters for all of them is whether they are > > > > "value-preserving", or whether they drop so much information that the > > > > compiler might decide to use a control dependency instead. That's true > > > > for every single one of them. > > > > > > > > Similarly, actual true control dependencies that limit the problem > > > > space sufficiently that the actual pointer value no longer has > > > > significant information in it (see the above example) are also things > > > > that remove information to the point that only a control dependency > > > > remains. Even when the value itself is not modified in any way at all. > > > > > > I agree that just considering syntactic properties of the program seems > > > to be insufficient. Making it instead depend on whether there is a > > > "semantic" dependency due to a value being "necessary" to compute a > > > result seems better. However, whether a value is "necessary" might not > > > be obvious, and I understand Paul's argument that he does not want to > > > have to reason about all potential compiler optimizations. Thus, I > > > believe we need to specify when a value is "necessary". > > > > > > I have a suggestion for a somewhat different formulation of the feature > > > that you seem to have in mind, which I'll discuss below. Excuse the > > > verbosity of the following, but I'd rather like to avoid > > > misunderstandings than save a few words. > > > > Thank you very much for putting this forward! I must confess that I was > > stuck, and my earlier attempt now enshrined in the C11 and C++11 standards > > is quite clearly way bogus. > > > > One possible saving grace: From discussions at the standards committee > > meeting a few weeks ago, there is a some chance that the committee will > > be willing to do a rip-and-replace on the current memory_order_consume > > wording, without provisions for backwards compatibility with the current > > bogosity. > > > > > What we'd like to capture is that a value originating from a mo_consume > > > load is "necessary" for a computation (e.g., it "cannot" be replaced > > > with value predictions and/or control dependencies); if that's the case > > > in the program, we can reasonably assume that a compiler implementation > > > will transform this into a data dependency, which will then lead to > > > ordering guarantees by the HW. > > > > > > However, we need to specify when a value is "necessary". We could say > > > that this is implementation-defined, and use a set of litmus tests > > > (e.g., like those discussed in the thread) to roughly carve out what a > > > programmer could expect. This may even be practical for a project like > > > the Linux kernel that follows strict project-internal rules and pays a > > > lot of attention to what the particular implementations of compilers > > > expected to compile the kernel are doing. However, I think this > > > approach would be too vague for the standard and for many other > > > programs/projects. > > > > I agree that a number of other projects would have more need for this than > > might the kernel. Please understand that this is in no way denigrating > > the intelligence of other projects' members. It is just that many of > > them have only recently started seriously thinking about concurrency. > > In contrast, the Linux kernel community has been doing concurrency since > > the mid-1990s. Projects with less experience with concurrency will > > probably need more help, from the compiler and from elsewhere as well. > > > > Your proposal looks quite promising at first glance. But rather than > > try and comment on it immediately, I am going to take a number of uses of > > RCU from the Linux kernel and apply your proposal to them, then respond > > with the results > > And here is an initial set of six selected randomly from the Linux kernel > (assuming you trust awk's random-number generator). This is of course a > tiny subset of what is in the kernel, but should be a good set to start > with. Looks like a reasonable start to me, though I would not expect the > kernel to convert over wholesale any time soon. Which is OK, there are > userspace projects using RCU. > > Thoughts? Am I understanding your proposal at all? ;-)
Thanks for the examples. I do think you understand the proposal :) We'll probably have to find more input regarding which implicit casts to allow between value_dep_preserving and non-value_dep_preserving expressions. This seems to affect the annotation overhead, and perhaps also which simple bugs a compiler could catch or warn about. > ------------------------------------------------------------------------ > > value_dep_preserving usage examples. > > /* The following is approximate -- need sparse and maybe other checking. */ > #define rcu_dereference(x) atomic_load_explicit(&(x), memory_order_consume) > > 1. mm/vmalloc.c __purge_vmap_area_lazy() > > This requires only two small changes. I am not sure that the > second change is necessary. My guess is that it is, on the theory > that passing a non-value_dep_preserving variable in through a > value_dep_preserving argument is guaranteed OK, give or take > unnecessary suppression of optimizations, It would be disallowed by what I proposed because one would cast non-value_dep_preserving (non-vdp) to value_dep_preserving (vdp), and one could argue that this may be an indication for a bug. However, I see your point that it wouldn't actually be a problem if none of the code expects to make use of value-dependency-based ordering. I don't have a strong opinion how to handle this. * If we disallow non-vdp to vdp casts, we might get code duplication for helper functions that are used in both cases. On the upside, we might catch a few bugs. * If we allow non-vdp to vdp casts, we can still let the compiler emit warnings if this happens (maybe with no warning if there is an explicit cast). > but that passing a > value_dep_preserving in through a non-value_dep_preserving could > be a serious bug. Yes, unless you don't expect any ordering for uses of the non-vdp. (Which is where my thought about using a custom macro/function to make accesses that depend on ordering comes from -- if those are know, the macro/function can check that the incoming type is indeed vdp-typed.) > That said, the Linux kernel convention is that once you leave > the outermost rcu_read_lock(), there is no more value dependency > preservation. Implementing this convention would remove the > need for the cast, for whatever that is worth. > > static void __purge_vmap_area_lazy(...) > { > static DEFINE_SPINLOCK(purge_lock); > LIST_HEAD(valist); > struct vmap_area value_dep_preserving *va; /* CHANGE */ > struct vmap_area *n_va; > int nr = 0; > > ... > > rcu_read_lock(); > list_for_each_entry_rcu(va, &vmap_area_list, list) { > if (va->flags & VM_LAZY_FREE) { > if (va->va_start < *start) > *start = va->va_start; > if (va->va_end > *end) > *end = va->va_end; > nr += (va->va_end - va->va_start) >> PAGE_SHIFT; > list_add_tail(&va->purge_list, &valist); > va->flags |= VM_LAZY_FREEING; > va->flags &= ~VM_LAZY_FREE; > } > } > rcu_read_unlock(); > > ... > > if (nr) { > spin_lock(&vmap_area_lock); > list_for_each_entry_safe(va, n_va, &valist, purge_list) > __free_vmap_area((struct vmap_area *)va); /* > CHANGE */ I had assumed that such casts wouldn't be necessary, but I agree that one can see this either way. > spin_unlock(&vmap_area_lock); > } > ... > } > > 2. net/core/sock.c sock_def_wakeup() > > static void sock_def_wakeup(struct sock *sk) > { > struct socket_wq value_dep_preserving *wq; /* CHANGE */ > > rcu_read_lock(); > wq = rcu_dereference(sk->sk_wq); > if (wq_has_sleeper(wq)) > wake_up_interruptible_all(&((struct socket_wq > *)wq->wait)); /* CHANGE */ See above. If we decide that vdp to non-vdp casts are allowed, this change isn't required. > rcu_read_unlock(); > } > > This calls wq_has_sleeper(): > > static inline bool > wq_has_sleeper(struct socket_wq value_dep_preserving *wq) /* CHANGE */ > { > /* We need to be sure we are in sync with the > * add_wait_queue modifications to the wait queue. > * > * This memory barrier is paired in the sock_poll_wait. > */ > smp_mb(); > return wq && waitqueue_active(&wq->wait); > } > > Although wq_has_sleeper() has a full memory barrier and therefore > does not need value_dep_preserving, it seems to be called from > within a lot of RCU read-side critical sections, so it is likely > a bit nicer to decorate its argument than to apply casts to all > calls. > > Is "&wq->wait" also value_dep_preserving? It would be with the proposed wording, and I believe it should be. We're computing just an adress, so ISTM this isn't much different from (wq + x). > The call to > wake_up_interruptible_all() doesn't need it to be because it > acquires a spinlock within the structure, and the ensuing barriers > and atomic instructions enforce all the ordering that is required. > > The call waitqueue_active() is preceded by a full barrier, so > it too has all the ordering it needs. > > So this example is slightly nicer if, given a value_dep_preserving > pointer "p", "&p->f" is non-value_dep_preserving. But there > will likely be other examples that strongly prefer otherwise, > and the pair of casts required here are not that horrible. > Plus this approach matches our discussion earlier in this thread. > > Might be nice to have an intrinsic that takes a value_dep_preserving > pointer and returns a non-value_dep_preserving pointer with the > same value. And/or vice versa, depending on what rules we set up for casts. > > 3. net/ipv4/tcp_cong.c tcp_init_congestion_control() > > This example requires only one change. I am assuming > that if "p" is value_dep_preserving, then "p->a" is -not- > value_dep_preserving unless the struct field "a" has been > declared value_dep_preserving. I agree. > This means that there is no > need to change the "try_module_get(ca->owner)", which is a nice > improvement over the current memory_order_consume work. > > I believe that this approach will do much to trim what would > otherwise be over-large value_dep_preserving regions. > > void tcp_init_congestion_control(struct sock *sk) > { > struct inet_connection_sock *icsk = inet_csk(sk); > struct tcp_congestion_ops value_dep_preserving *ca; /* CHANGE */ > > if (icsk->icsk_ca_ops == &tcp_init_congestion_ops) { > rcu_read_lock(); > list_for_each_entry_rcu(ca, &tcp_cong_list, list) { > if (try_module_get(ca->owner)) { > icsk->icsk_ca_ops = ca; > break; > } > > /* fallback to next available */ > } > rcu_read_unlock(); > } > > if (icsk->icsk_ca_ops->init) > icsk->icsk_ca_ops->init(sk); > } > > 4. drivers/target/tcm_fc/tfc_sess.c ft_sess_get() > > This one brings up an interesting point. I am guessing that > the value_dep_preserving should be silently stripped when the > value is passed in through __VA_ARGS__, as in pr_debug() below. > > Thoughts? It would if we allow implicit casts from non-vdp to vdp.