On Wed, Sep 06, 2017 at 09:13:04AM +0200, Andrea Parri wrote: > > + smp_mb__before_atomic(); > > I am wondering whether we should be using smp_wmb() instead: this would > provide the above guarantee and save a full barrier on powerpc/arm64.
Right, did that. > > + set_bit(REQ_ATOM_STARTED, &rq->atomic_flags); > > + if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags)) { > > + /* > > + * Coherence order guarantees these consequtive stores to a > > + * singe variable propagate in the specified order. Thus the > > + * clear_bit() is ordered _after_ the set bit. See > > + * blk_mq_check_expired(). > > + */ > > clear_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags); > > It could be useful to stress that set_bit(), clear_bit() must "act" on > the same subword of the unsigned long (whatever "subword" means at this > level...) to rely on the coherence order (c.f., alpha's implementation). As I wrote to your initial reply (which I now saw was private) I went through the architectures again and found h8300 to use byte ops to implement all the bitops. So subword here means byte :/ The last time we looked at this was for PG_waiter and back then I think we settled on u32 (with Alpha for example using 32bit ll/sc ops). Linus moved PG_waiters to the same byte though, so that is all fine. b91e1302ad9b ("mm: optimize PageWaiters bit use for unlock_page()") > > + if (time_after_eq(jiffies, deadline)) { > > + if (!blk_mark_rq_complete(rq)) { > > + /* > > + * Relies on the implied MB from test_and_clear() to > > + * order the COMPLETE load against the STARTED load. > > + * Orders against the coherence order in > > + * blk_mq_start_request(). > > I understand "from test_and_set_bit()" (in blk_mark_rq_complete()) and > that the interested cycle is: > > /* in blk_mq_start_request() */ > [STORE STARTED bit = 1 into atomic_flags] > -->co [STORE COMPLETE bit = 0 into atomic_flags] > /* in blk_mq_check_expired() */ > -->rf [LOAD COMPLETE bit = 0 from atomic_flags] > -->po-loc [LOAD STARTED bit = 0 from atomic_flags] > /* in blk_mq_start_request() again */ > -->fr [STORE STARTED bit = 1 into atomic_flags] > > (N.B. Assume all accesses happen to/from the same subword.) > > This cycle being forbidden by the "coherence check", I'd say we do not > need to rely on the MB mentioned by the comment; what am I missing? Nothing, I forgot about the read-after-read thing but did spot the MB. Either one suffices to guarantee the order we need. It just needs to be documented as being relied upon.