On Thu, 2014-02-27 at 17:02 -0800, Paul E. McKenney wrote:
> On Thu, Feb 27, 2014 at 09:50:21AM -0800, Paul E. McKenney wrote:
> > On Thu, Feb 27, 2014 at 04:37:33PM +0100, Torvald Riegel wrote:
> > > [email protected]
> > >
> > > On Mon, 2014-02-24 at 11:54 -0800, Linus Torvalds wrote:
> > > > On Mon, Feb 24, 2014 at 10:53 AM, Paul E. McKenney
> > > > <[email protected]> wrote:
> > > > >
> > > > > Good points. How about the following replacements?
> > > > >
> > > > > 3. Adding or subtracting an integer to/from a chained pointer
> > > > > results in another chained pointer in that same pointer chain.
> > > > > The results of addition and subtraction operations that cancel
> > > > > the chained pointer's value (for example, "p-(long)p" where
> > > > > "p"
> > > > > is a pointer to char) are implementation defined.
> > > > >
> > > > > 4. Bitwise operators ("&", "|", "^", and I suppose also "~")
> > > > > applied to a chained pointer and an integer for the purposes
> > > > > of alignment and pointer translation results in another
> > > > > chained pointer in that same pointer chain. Other uses
> > > > > of bitwise operators on chained pointers (for example,
> > > > > "p|~0") are implementation defined.
> > > >
> > > > Quite frankly, I think all of this language that is about the actual
> > > > operations is irrelevant and wrong.
> > > >
> > > > It's not going to help compiler writers, and it sure isn't going to
> > > > help users that read this.
> > > >
> > > > Why not just talk about "value chains" and that any operations that
> > > > restrict the value range severely end up breaking the chain. There is
> > > > no point in listing the operations individually, because every single
> > > > operation *can* restrict things. Listing individual operations and
> > > > depdendencies is just fundamentally wrong.
> > >
> > > [...]
> > >
> > > > The *only* thing that matters for all of them is whether they are
> > > > "value-preserving", or whether they drop so much information that the
> > > > compiler might decide to use a control dependency instead. That's true
> > > > for every single one of them.
> > > >
> > > > Similarly, actual true control dependencies that limit the problem
> > > > space sufficiently that the actual pointer value no longer has
> > > > significant information in it (see the above example) are also things
> > > > that remove information to the point that only a control dependency
> > > > remains. Even when the value itself is not modified in any way at all.
> > >
> > > I agree that just considering syntactic properties of the program seems
> > > to be insufficient. Making it instead depend on whether there is a
> > > "semantic" dependency due to a value being "necessary" to compute a
> > > result seems better. However, whether a value is "necessary" might not
> > > be obvious, and I understand Paul's argument that he does not want to
> > > have to reason about all potential compiler optimizations. Thus, I
> > > believe we need to specify when a value is "necessary".
> > >
> > > I have a suggestion for a somewhat different formulation of the feature
> > > that you seem to have in mind, which I'll discuss below. Excuse the
> > > verbosity of the following, but I'd rather like to avoid
> > > misunderstandings than save a few words.
> >
> > Thank you very much for putting this forward! I must confess that I was
> > stuck, and my earlier attempt now enshrined in the C11 and C++11 standards
> > is quite clearly way bogus.
> >
> > One possible saving grace: From discussions at the standards committee
> > meeting a few weeks ago, there is a some chance that the committee will
> > be willing to do a rip-and-replace on the current memory_order_consume
> > wording, without provisions for backwards compatibility with the current
> > bogosity.
> >
> > > What we'd like to capture is that a value originating from a mo_consume
> > > load is "necessary" for a computation (e.g., it "cannot" be replaced
> > > with value predictions and/or control dependencies); if that's the case
> > > in the program, we can reasonably assume that a compiler implementation
> > > will transform this into a data dependency, which will then lead to
> > > ordering guarantees by the HW.
> > >
> > > However, we need to specify when a value is "necessary". We could say
> > > that this is implementation-defined, and use a set of litmus tests
> > > (e.g., like those discussed in the thread) to roughly carve out what a
> > > programmer could expect. This may even be practical for a project like
> > > the Linux kernel that follows strict project-internal rules and pays a
> > > lot of attention to what the particular implementations of compilers
> > > expected to compile the kernel are doing. However, I think this
> > > approach would be too vague for the standard and for many other
> > > programs/projects.
> >
> > I agree that a number of other projects would have more need for this than
> > might the kernel. Please understand that this is in no way denigrating
> > the intelligence of other projects' members. It is just that many of
> > them have only recently started seriously thinking about concurrency.
> > In contrast, the Linux kernel community has been doing concurrency since
> > the mid-1990s. Projects with less experience with concurrency will
> > probably need more help, from the compiler and from elsewhere as well.
> >
> > Your proposal looks quite promising at first glance. But rather than
> > try and comment on it immediately, I am going to take a number of uses of
> > RCU from the Linux kernel and apply your proposal to them, then respond
> > with the results
>
> And here is an initial set of six selected randomly from the Linux kernel
> (assuming you trust awk's random-number generator). This is of course a
> tiny subset of what is in the kernel, but should be a good set to start
> with. Looks like a reasonable start to me, though I would not expect the
> kernel to convert over wholesale any time soon. Which is OK, there are
> userspace projects using RCU.
>
> Thoughts? Am I understanding your proposal at all? ;-)
Thanks for the examples. I do think you understand the proposal :)
We'll probably have to find more input regarding which implicit casts to
allow between value_dep_preserving and non-value_dep_preserving
expressions. This seems to affect the annotation overhead, and perhaps
also which simple bugs a compiler could catch or warn about.
> ------------------------------------------------------------------------
>
> value_dep_preserving usage examples.
>
> /* The following is approximate -- need sparse and maybe other checking. */
> #define rcu_dereference(x) atomic_load_explicit(&(x), memory_order_consume)
>
> 1. mm/vmalloc.c __purge_vmap_area_lazy()
>
> This requires only two small changes. I am not sure that the
> second change is necessary. My guess is that it is, on the theory
> that passing a non-value_dep_preserving variable in through a
> value_dep_preserving argument is guaranteed OK, give or take
> unnecessary suppression of optimizations,
It would be disallowed by what I proposed because one would cast
non-value_dep_preserving (non-vdp) to value_dep_preserving (vdp), and
one could argue that this may be an indication for a bug. However, I
see your point that it wouldn't actually be a problem if none of the
code expects to make use of value-dependency-based ordering.
I don't have a strong opinion how to handle this.
* If we disallow non-vdp to vdp casts, we might get code duplication for
helper functions that are used in both cases. On the upside, we might
catch a few bugs.
* If we allow non-vdp to vdp casts, we can still let the compiler emit
warnings if this happens (maybe with no warning if there is an explicit
cast).
> but that passing a
> value_dep_preserving in through a non-value_dep_preserving could
> be a serious bug.
Yes, unless you don't expect any ordering for uses of the non-vdp.
(Which is where my thought about using a custom macro/function to make
accesses that depend on ordering comes from -- if those are know, the
macro/function can check that the incoming type is indeed vdp-typed.)
> That said, the Linux kernel convention is that once you leave
> the outermost rcu_read_lock(), there is no more value dependency
> preservation. Implementing this convention would remove the
> need for the cast, for whatever that is worth.
>
> static void __purge_vmap_area_lazy(...)
> {
> static DEFINE_SPINLOCK(purge_lock);
> LIST_HEAD(valist);
> struct vmap_area value_dep_preserving *va; /* CHANGE */
> struct vmap_area *n_va;
> int nr = 0;
>
> ...
>
> rcu_read_lock();
> list_for_each_entry_rcu(va, &vmap_area_list, list) {
> if (va->flags & VM_LAZY_FREE) {
> if (va->va_start < *start)
> *start = va->va_start;
> if (va->va_end > *end)
> *end = va->va_end;
> nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
> list_add_tail(&va->purge_list, &valist);
> va->flags |= VM_LAZY_FREEING;
> va->flags &= ~VM_LAZY_FREE;
> }
> }
> rcu_read_unlock();
>
> ...
>
> if (nr) {
> spin_lock(&vmap_area_lock);
> list_for_each_entry_safe(va, n_va, &valist, purge_list)
> __free_vmap_area((struct vmap_area *)va); /*
> CHANGE */
I had assumed that such casts wouldn't be necessary, but I agree that
one can see this either way.
> spin_unlock(&vmap_area_lock);
> }
> ...
> }
>
> 2. net/core/sock.c sock_def_wakeup()
>
> static void sock_def_wakeup(struct sock *sk)
> {
> struct socket_wq value_dep_preserving *wq; /* CHANGE */
>
> rcu_read_lock();
> wq = rcu_dereference(sk->sk_wq);
> if (wq_has_sleeper(wq))
> wake_up_interruptible_all(&((struct socket_wq
> *)wq->wait)); /* CHANGE */
See above. If we decide that vdp to non-vdp casts are allowed, this
change isn't required.
> rcu_read_unlock();
> }
>
> This calls wq_has_sleeper():
>
> static inline bool
> wq_has_sleeper(struct socket_wq value_dep_preserving *wq) /* CHANGE */
> {
> /* We need to be sure we are in sync with the
> * add_wait_queue modifications to the wait queue.
> *
> * This memory barrier is paired in the sock_poll_wait.
> */
> smp_mb();
> return wq && waitqueue_active(&wq->wait);
> }
>
> Although wq_has_sleeper() has a full memory barrier and therefore
> does not need value_dep_preserving, it seems to be called from
> within a lot of RCU read-side critical sections, so it is likely
> a bit nicer to decorate its argument than to apply casts to all
> calls.
>
> Is "&wq->wait" also value_dep_preserving?
It would be with the proposed wording, and I believe it should be.
We're computing just an adress, so ISTM this isn't much different from
(wq + x).
> The call to
> wake_up_interruptible_all() doesn't need it to be because it
> acquires a spinlock within the structure, and the ensuing barriers
> and atomic instructions enforce all the ordering that is required.
>
> The call waitqueue_active() is preceded by a full barrier, so
> it too has all the ordering it needs.
>
> So this example is slightly nicer if, given a value_dep_preserving
> pointer "p", "&p->f" is non-value_dep_preserving. But there
> will likely be other examples that strongly prefer otherwise,
> and the pair of casts required here are not that horrible.
> Plus this approach matches our discussion earlier in this thread.
>
> Might be nice to have an intrinsic that takes a value_dep_preserving
> pointer and returns a non-value_dep_preserving pointer with the
> same value.
And/or vice versa, depending on what rules we set up for casts.
>
> 3. net/ipv4/tcp_cong.c tcp_init_congestion_control()
>
> This example requires only one change. I am assuming
> that if "p" is value_dep_preserving, then "p->a" is -not-
> value_dep_preserving unless the struct field "a" has been
> declared value_dep_preserving.
I agree.
> This means that there is no
> need to change the "try_module_get(ca->owner)", which is a nice
> improvement over the current memory_order_consume work.
>
> I believe that this approach will do much to trim what would
> otherwise be over-large value_dep_preserving regions.
>
> void tcp_init_congestion_control(struct sock *sk)
> {
> struct inet_connection_sock *icsk = inet_csk(sk);
> struct tcp_congestion_ops value_dep_preserving *ca; /* CHANGE */
>
> if (icsk->icsk_ca_ops == &tcp_init_congestion_ops) {
> rcu_read_lock();
> list_for_each_entry_rcu(ca, &tcp_cong_list, list) {
> if (try_module_get(ca->owner)) {
> icsk->icsk_ca_ops = ca;
> break;
> }
>
> /* fallback to next available */
> }
> rcu_read_unlock();
> }
>
> if (icsk->icsk_ca_ops->init)
> icsk->icsk_ca_ops->init(sk);
> }
>
> 4. drivers/target/tcm_fc/tfc_sess.c ft_sess_get()
>
> This one brings up an interesting point. I am guessing that
> the value_dep_preserving should be silently stripped when the
> value is passed in through __VA_ARGS__, as in pr_debug() below.
>
> Thoughts?
It would if we allow implicit casts from non-vdp to vdp.