On Fri, May 20, 2016 at 22:48:11 -0400, Emilio G. Cota wrote: > On Sat, May 21, 2016 at 01:13:20 +0300, Sergey Fedorov wrote: > > > +static inline > > > +void *qht_do_lookup(struct qht_bucket *head, qht_lookup_func_t func, > > > + const void *userp, uint32_t hash) > > > +{ > > > + struct qht_bucket *b = head; > > > + int i; > > > + > > > + do { > > > + for (i = 0; i < QHT_BUCKET_ENTRIES; i++) { > > > + if (atomic_read(&b->hashes[i]) == hash) { > > > + void *p = atomic_read(&b->pointers[i]); > > > > Why do we need this atomic_read() and other (looking a bit inconsistent) > > atomic operations on 'b->pointers' and 'b->hash'? if we always have to > > access them protected properly by a seqlock together with a spinlock? > > [ There should be consistency: read accesses use the atomic ops to read, > while write accesses have acquired the bucket lock so don't need them. > Well, they need care when they write, since there may be concurrent > readers. ] > > I'm using atomic_read but what I really want is ACCESS_ONCE. That is: > (1) Make sure that the accesses are done in a single instruction (even > though gcc doesn't explicitly guarantee it even to aligned addresses > anymore[1]) > (2) Make sure the pointer value is only read once, and never refetched. > This is what comes right after the pointer is read: > > + if (likely(p) && likely(func(p, userp))) { > > + return p; > > + } > Refetching the pointer value might result in us passing something > a NULL p value to the comparison function (since there may be > concurrent updaters!), with an immediate segfault. See [2] for a > discussion on this (essentially the compiler assumes that there's > only a single thread). > > Given that even reading a garbled hash is OK (we don't really need (1), > since the seqlock will make us retry anyway), I've changed the code to: > > for (i = 0; i < QHT_BUCKET_ENTRIES; i++) { > - if (atomic_read(&b->hashes[i]) == hash) { > + if (b->hashes[i] == hash) { > + /* make sure the pointer is read only once */ > void *p = atomic_read(&b->pointers[i]); > > if (likely(p) && likely(func(p, userp))) { > > Performance-wise this is the impact after 10 tries for: > $ taskset -c 0 tests/qht-bench \ > -d 5 -n 1 -u 0 -k 4096 -K 4096 -l 4096 -r 4096 -s 4096 > on my Haswell machine I get, in Mops/s: > atomic_read() for all 40.389 +- 0.20888327415622 > atomic_read(p) only 40.759 +- 0.212835356294224 > no atomic_read(p) (unsafe) 40.559 +- 0.121422128680622 > > Note that the unsafe version is slightly slower; I guess the CPU is trying > to speculate too much and is gaining little from it. > > [1] "Linux-Kernel Memory Model" by Paul McKenney > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4374.html > [2] https://lwn.net/Articles/508991/
A small update: I just got rid of all the atomic_read/set's that apply to the hashes, since retries will take care of possible races. The atomic_read/set's remain only for b->pointers[], for the above reasons. E.