On 4 Sep 2020, at 14:36, Jessica Clarke <jrt...@freebsd.org> wrote: > On 4 Sep 2020, at 14:09, Konstantin Belousov <kostik...@gmail.com> wrote: >> On Fri, Sep 04, 2020 at 11:22:18AM +0000, Marcin Wojtas wrote: >>> Author: mw >>> Date: Fri Sep 4 11:22:18 2020 >>> New Revision: 365326 >>> URL: https://svnweb.freebsd.org/changeset/base/365326 >>> >>> Log: >>> MFC: r346593 >>> >>> Add barrier in buf ring peek function to prevent race in ARM and ARM64. >>> >>> Obtained from: Semihalf >>> Sponsored by: Amazon, Inc. >>> >>> Modified: >>> stable/12/sys/sys/buf_ring.h >>> Directory Properties: >>> stable/12/ (props changed) >>> >>> Modified: stable/12/sys/sys/buf_ring.h >>> ============================================================================== >>> --- stable/12/sys/sys/buf_ring.h Fri Sep 4 04:31:56 2020 >>> (r365325) >>> +++ stable/12/sys/sys/buf_ring.h Fri Sep 4 11:22:18 2020 >>> (r365326) >>> @@ -310,14 +310,23 @@ buf_ring_peek_clear_sc(struct buf_ring *br) >>> if (!mtx_owned(br->br_lock)) >>> panic("lock not held on single consumer dequeue"); >>> #endif >>> - /* >>> - * I believe it is safe to not have a memory barrier >>> - * here because we control cons and tail is worst case >>> - * a lagging indicator so we worst case we might >>> - * return NULL immediately after a buffer has been enqueued >>> - */ >>> + >>> if (br->br_cons_head == br->br_prod_tail) >>> return (NULL); >>> + >>> +#if defined(__arm__) || defined(__aarch64__) >>> + /* >>> + * The barrier is required there on ARM and ARM64 to ensure, that >>> + * br->br_ring[br->br_cons_head] will not be fetched before the above >>> + * condition is checked. >>> + * Without the barrier, it is possible, that buffer will be fetched >>> + * before the enqueue will put mbuf into br, then, in the meantime, the >>> + * enqueue will update the array and the br_prod_tail, and the >>> + * conditional check will be true, so we will return previously fetched >>> + * (and invalid) buffer. >>> + */ >>> + atomic_thread_fence_acq(); >>> +#endif >> >> Putting the semantic of the change aside, why did you added the fence (it is >> a fence, not barrier as stated in the comment) only to arm* ? If it is >> needed, it is needed for all arches. > > Agreed. The code looks fine, though I would have made it an acquire > load of br_prod_tail myself to be able to take advantage load-acquire > instructions when present, and better document what the exact issue is. > I also don't think the comment needs to be quite so extensive > (especially since atomic_load_acq_32 is somewhat self-documenting in > terms of one half of the race); if we had a comment like this for every > fence in the kernel we'd never get anything done. > > There's also an ARM-specific fence in buf_ring_dequeue_sc: > >> /* >> * This is a workaround to allow using buf_ring on ARM and ARM64. >> * ARM64TODO: Fix buf_ring in a generic way. >> * REMARKS: It is suspected that br_cons_head does not require >> * load_acq operation, but this change was extensively tested >> * and confirmed it's working. To be reviewed once again in >> * FreeBSD-12. >> * >> * Preventing following situation: >> >> * Core(0) - buf_ring_enqueue() >> Core(1) - buf_ring_dequeue_sc() >> * ----------------------------------------- >> ---------------------------------------------- >> * >> * >> cons_head = br->br_cons_head; >> * atomic_cmpset_acq_32(&br->br_prod_head, ...)); >> * >> buf = br->br_ring[cons_head]; <see <1>> >> * br->br_ring[prod_head] = buf; >> * atomic_store_rel_32(&br->br_prod_tail, ...); >> * >> prod_tail = br->br_prod_tail; >> * >> if (cons_head == prod_tail) >> * >> return (NULL); >> * >> <condition is false and code uses invalid(old) buf>` >> * >> * <1> Load (on core 1) from br->br_ring[cons_head] can be reordered >> (speculative readed) by CPU. >> */ >> #if defined(__arm__) || defined(__aarch64__) >> cons_head = atomic_load_acq_32(&br->br_cons_head); >> #else >> cons_head = br->br_cons_head; >> #endif >> prod_tail = atomic_load_acq_32(&br->br_prod_tail); > > > The comment is completely correct that the ARM-specific fence is a > waste of time. It's the single-consumer path, so such fences are just > synchronising with the current thread and thus pointless. The important > one is the load-acquire of br_prod_tail, as has been discovered (sort > of) in the peek case leading to this comment, which already stops the > reordering in question.
Fixes filed as https://reviews.freebsd.org/D26336 for those interested. Jess _______________________________________________ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"