On Tue, May 20, 2014 at 02:13:58PM -0700, Jarno Rajahalme wrote: > > On May 20, 2014, at 2:03 PM, Ben Pfaff <b...@nicira.com> wrote: > > > On Tue, May 20, 2014 at 01:22:45PM -0700, Jarno Rajahalme wrote: > >> > >> On May 20, 2014, at 9:48 AM, Ben Pfaff <b...@nicira.com> wrote: > >> > >>> On Tue, May 20, 2014 at 09:12:23AM -0700, Ben Pfaff wrote: > >>>> On Fri, May 16, 2014 at 04:38:02PM -0700, Jarno Rajahalme wrote: > >>>>>> + if (b->nodes[slot] == node) { > >>>>>> + cmap_set_bucket(b, slot, cmap_node_next_protected(node), > >>>>>> hash); > >>>>> > >>>>> ?hash? is not changing here, so could just set the nodes: > >>>>> > >>>>> b->nodes[slot] = cmap_node_next_protected(node); > >>>>> > >>>>> Btw, what is the rationale that the nodes pointers are not RCU > >>>>> pointers? If they were, it would feel possible to combine this special > >>>>> case with the loop below. > >>>> > >>>> Good points. I'll work on that for a v3. > >>> > >>> After thinking a little further, I am not sure that it would become > >>> possible to combine them, because I think that the cases are a little > >>> different: > >>> > >>> * If we are removing the last node with a hash, which is usually > >>> the case if node == b->nodes[slot], then we want to make sure > >>> that from the viewpoint of any reader that this is atomic > >>> (that is, the change to ->hashes[] and ->nodes[]), by > >>> incrementing the counter around the change. I am not > >>> absolutely certain that this is required, but the cost is > >>> minimal so, lacking confidence, I prefer to do it. > >> > >> My point was that the hash need not be changed, if the dup insertion > >> code is also changed to not care about the node value (see the other > >> comment I just sent). > > > > OK. Like this? > > > > diff --git a/lib/cmap.c b/lib/cmap.c > > index 30a6e2d..d291ec5 100644 > > --- a/lib/cmap.c > > +++ b/lib/cmap.c > > @@ -648,7 +648,8 @@ cmap_remove__(struct cmap_impl *impl, struct cmap_node > > *node, > > } > > > > if (b->nodes[slot] == node) { > > - cmap_set_bucket(b, slot, cmap_node_next_protected(node), hash); > > + b->nodes[slot] = cmap_node_next_protected(node); > > + atomic_thread_fence(memory_order_release); > > Yes. Would an ovsrcu_set() include the release barrier, if nodes[] > were RCU pointers? (This relating to my earlier question about
Yes. Maybe we really should make nodes[] into RCU pointers. > > } else { > > struct cmap_node *iter = b->nodes[slot]; > > for (;;) { > > > >>> > >>> * Otherwise, we are shortening a linked list, but not > >>> eliminating its slot, which does not affect readers in the > >>> same way, so an ordinary RCU store should suffice. > >>> > >>> What I'm increasingly uncertain about is whether cmap_find() is correct. > >>> The intention is that the atomic reads of the counters before and after > >>> checking the nodes and the hashes should ensure that the cache lines > >>> occupied by the buckets are stable. I think that's going to be true in > >>> practice, with current compiler technology. But I am not sure that the > >>> atomic_reads on the counters actually means that the node and hash reads > >>> can't be moved outside the counter reads. If not, though, making the > >>> node reads atomic (via RCU) and even the hash reads atomic (by making > >>> them atomic_uint32s) wouldn't help. I think that the only thing that > >>> would help would be adding explicit acquire and release barriers. That > >>> might actually, in conjunction with good comments, be clearer than what > >>> we have now. > >>> > >>> What do you think? > >> > >> atomic_read() implies memory_order_seq_cst, which is stronger than > >> memory_order_acquire. An memory_order_acquire should guarantee that > >> no memory operations after the atomic_read are moved before it, so > >> we read the data only after reading an even counter. When we re-read > >> the counter to verify it has not changed, an acquire barrier would > >> guarantee no memory operations after the check are moved before it, > >> but it would be possible for the memory operations before it to be > >> moved after it. So the check needs a release barrier, even if we are > >> only reading, to guarantee that memory operations before the check > >> are not moved after it. The memory_order_seq_cst implied by > >> atomic_read() does that, but is too strong, a memory_order_acq_rel > >> should suffice, or even memory_order_acquire for the > >> read_even_counter, and a memory_order_release for a > >> ?check_counter()?. Makes sense? > > > > Yes. Like this? > > > > diff --git a/lib/cmap.c b/lib/cmap.c > > index 30a6e2d..d291ec5 100644 > > --- a/lib/cmap.c > > +++ b/lib/cmap.c > > @@ -245,11 +245,11 @@ cmap_is_empty(const struct cmap *cmap) > > } > > > > static uint32_t > > -read_counter(struct cmap_bucket *bucket) > > +read_counter(struct cmap_bucket *bucket, memory_order order) > > { > > uint32_t counter; > > > > - atomic_read(&bucket->counter, &counter); > > + atomic_read_explicit(&bucket->counter, &counter, order); > > return counter; > > } > > > > @@ -259,7 +259,7 @@ read_even_counter(struct cmap_bucket *bucket) > > uint32_t counter; > > > > do { > > - counter = read_counter(bucket); > > + counter = read_counter(bucket, memory_order_acquire); > > } while (OVS_UNLIKELY(counter & 1)); > > > > return counter; > > @@ -291,7 +291,7 @@ retry: > > for (i = 0; i < CMAP_K; i++) { > > struct cmap_node *node = b1->nodes[i]; > > if (node && b1->hashes[i] == hash) { > > - if (OVS_UNLIKELY(read_counter(b1) != c1)) { > > + if (OVS_UNLIKELY(read_counter(b1, memory_order_release) != > > c1)) { > > I would find the code more balanced if a read_even_counter() was > paired with a ?read_same_counter()?, rather than a raw > read_counter. To be more specific, the explicit barrier should be > visible or hidden by an utility function the same way in both cases. Fair enough. I folded this in: diff --git a/lib/cmap.c b/lib/cmap.c index dcb82fc..bd1d516 100644 --- a/lib/cmap.c +++ b/lib/cmap.c @@ -265,6 +265,12 @@ read_even_counter(struct cmap_bucket *bucket) return counter; } +static bool +counter_changed(const struct cmap_bucket *b, uint32_t c) +{ + return OVS_UNLIKELY(read_counter(b, memory_order_release) != c); +} + /* Searches 'cmap' for an element with the specified 'hash'. If one or more is * found, returns a pointer to the first one, otherwise a null pointer. All of * the nodes on the returned list are guaranteed to have exactly the given @@ -291,7 +297,7 @@ retry: for (i = 0; i < CMAP_K; i++) { struct cmap_node *node = b1->nodes[i]; if (node && b1->hashes[i] == hash) { - if (OVS_UNLIKELY(read_counter(b1, memory_order_release) != c1)) { + if (counter_changed(b1, c1)) { goto retry; } return node; @@ -303,15 +309,14 @@ retry: for (i = 0; i < CMAP_K; i++) { struct cmap_node *node = b2->nodes[i]; if (node && b2->hashes[i] == hash) { - if (OVS_UNLIKELY(read_counter(b2, memory_order_release) != c2)) { + if (counter_changed(b2, c2)) { goto retry; } return node; } } - if (OVS_UNLIKELY(read_counter(b1, memory_order_release) != c1) || - OVS_UNLIKELY(read_counter(b2, memory_order_release) != c2)) { + if (counter_changed(b1, c1) || counter_changed(b2, c2)) { goto retry; } return NULL; > > goto retry; > > } > > return node; > > @@ -303,15 +303,15 @@ retry: > > for (i = 0; i < CMAP_K; i++) { > > struct cmap_node *node = b2->nodes[i]; > > if (node && b2->hashes[i] == hash) { > > - if (OVS_UNLIKELY(read_counter(b2) != c2)) { > > + if (OVS_UNLIKELY(read_counter(b2, memory_order_release) != > > c2)) { > > goto retry; > > } > > return node; > > } > > } > > > > - if (OVS_UNLIKELY(read_counter(b1) != c1) || > > - OVS_UNLIKELY(read_counter(b2) != c2)) { > > + if (OVS_UNLIKELY(read_counter(b1, memory_order_release) != c1) || > > + OVS_UNLIKELY(read_counter(b2, memory_order_release) != c2)) { > > goto retry; > > } > > return NULL; > > > > I realize that all these incrementals are a big mess. I've pushed the > > overall series to my "ovs-reviews" repo at > > https://github.com/blp/ovs-reviews in the "cmap" branch. I'll happily > > repost a v3 if that is easiest for you. > > I?d rather you tell me that you have addresses the questions I made > and push it :-) I think it would be best at this point get this in > and gain some experience with it. We can then enhance it as we see > opportunities for it. Thanks. I pushed this patch. We can fine-tune it (e.g. making nodes[] RCU pointers) later. I didn't push patch 2 yet since you pointed out a bug. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev