Re: Fences/Barriers when mixing C++ atomics and non-atomics

Vineet Gupta Thu, 13 Oct 2022 14:13:05 -0700

Hi Hans,

On 10/13/22 13:54, Hans Boehm wrote:

The generated code here is correct in both cases. In the RISC--V case,I believe it is conservative, at a minimum, in that atomics should notimply IO ordering. We had an earlier discussion, which seemed to haveconsensus in favor of that opinion. I believe clang does not enforceIO ordering.
You can think of a "sequentially consistent" load roughly as enforcingtwo properties:
1) It behaves as an "acquire" load. Later (in program order) memoryoperations do not advance past it. This is implicit for x86. Itrequires the trailing fence on RISC-V, which could probably beweakened to r,rw.

Acq implies later things won't leak out, but prior things could stillleak-in, meaning prior write could happen after load which contradictswhat user is asking by load(seq_cst) on x86 ?

2) It ensures that seq_cst operations are fully ordered. This meansthat, in addition to (1), and the corresponding fence for stores,every seq_cst store must be separated from a seq_cst load by at leasta w,r fence, so a seq_cst store followed by a seq_cst load is notreordered.


This makes sense when both store -> load are seq_cst.

But the question is what happens when that store is non atomic. IOW ifwe had a store(relaxed) -> load(seq_cst) would the generated code stillensure that load had a full barrier to prevent

w,r fences are discouraged on RISC-V, and probably no better thanrw,rw, so that's how the leading fence got there. (Again the ioordering should disappear. It's the responsibility of IO code toinsert that explicitly, rather than paying for it everywhere.)


Thanks for explaining the RV semantics.

x86 does (2) by associating that fence with stores instead of loads,either by using explicit fences after stores, or by turning storesinto xchg.

That makes sense as x86 has ld->ld and ld -> st architecturally ordered,so any fences ought to be associated with st.


Thx,
-Vineet

RISC-V could do the same. And I believe that if the current Aextension were the final word on the architecture, it should. But thatconvention is not compatible with the later introduction of an"acquire load", which I think is essential for performance, at leaston larger cores. So I think the two fence mapping for loads should bemaintained for now, as I suggested in the document I posted to the list.


Hans

On Thu, Oct 13, 2022 at 12:31 PM Vineet Gupta <vine...@rivosinc.com>wrote:


    Hi,

    I have a testcase (from real workloads) involving C++ atomics and
    trying
    to understand the codegen (gcc 12) for RVWMO and x86.
    It does mix atomics with non-atomics so not obvious what the
    behavior is
    intended to be hence some explicit CC of subject matter experts
    (apologies for that in advance).

    Test has a non-atomic store followed by an atomic_load(SEQ_CST). I
    assume that unadorned direct access defaults to
    safest/conservative seq_cst.

        extern int g;
        std::atomic<int> a;

        int bar_noaccessor(int n, int *n2)
        {
             *n2 = g;
             return n + a;
        }

        int bar_seqcst(int n, int *n2)
        {
             *n2 = g;
             return n + a.load(std::memory_order_seq_cst);
        }

    On RV (rvwmo), with current gcc 12 we get 2 full fences around the
    load
    as prescribed by Privileged Spec, Chpater A, Table A.6 (Mappings from
    C/C++ to RISC-V primitives).

        _Z10bar_seqcstiPi:
        .LFB382:
             .cfi_startproc
             lui    a5,%hi(g)
             lw    a5,%lo(g)(a5)
             sw    a5,0(a1)
        *fence    iorw,iorw*
             lui    a5,%hi(a)
             lw    a5,%lo(a)(a5)
        *fence    iorw,iorw*
             addw    a0,a5,a0
             ret


    OTOH, for x86 (same default toggles) there's no barriers at all.

        _Z10bar_seqcstiPi:
             endbr64
             movl    g(%rip), %eax
             movl    %eax, (%rsi)
             movl    a(%rip), %eax
             addl    %edi, %eax
             ret


    My naive intuition was x86 TSO would require a fence before
    load(seq_cst) for a prior store, even if that store was non
    atomic, so
    ensure load didn't bubble up ahead of store.

    Perhaps this begs the general question of intermixing non atomic
    accesses with atomics and if that is undefined behavior or some
    such. I
    skimmed through C++14 specification chapter Atomic Operations library
    but nothing's jumping out on the topic.

    Or is it much deeper, related to As-if rule or something.

    Thx,
    -Vineet

Re: Fences/Barriers when mixing C++ atomics and non-atomics

Reply via email to