We're certainly pushing for the same ABI (A.6 + trailing fence on store) in
LLVM as well. I'm about to upload a pull request for the psABI document
that describes this version of the ABI, and a bit of the rationale for it.
I'll attach the current draft here.

I agree that compatibility is critical here, not just across llvm and gcc,
but also with other language implementations. That's part of the reason to
get this correct asap.

I believe that standardizing on A.6 + trailing fence on store, though
initially suboptimal, is by far the best bet to get us to an efficient ABI
in the long term. I expect the A.7 ABI to perform well. A.6, even without
the trailing store fence, has annoyingly expensive seq_cst loads, which I
would really like to get away from.

Hans







On Fri, Apr 28, 2023 at 10:44 AM Patrick O'Neill <patr...@rivosinc.com>
wrote:

> On 4/28/23 09:29, Palmer Dabbelt wrote:
> > On Fri, 28 Apr 2023 09:14:00 PDT (-0700), jeffreya...@gmail.com wrote:
> >>
> >>
> >> On 4/27/23 10:22, Patrick O'Neill wrote:
> >>> This patchset aims to make the RISCV atomics implementation stronger
> >>> than the recommended mapping present in table A.6 of the ISA manual.
> >>>
> https://github.com/riscv/riscv-isa-manual/blob/c7cf84547b3aefacab5463add1734c1602b67a49/src/memory.tex#L1083-L1157
> >>>
> >>>
> >>> Context
> >>> ---------
> >>> GCC defined RISC-V mappings [1] before the Memory Model task group
> >>> finalized their work and provided the ISA Manual Table A.6/A.7
> >>> mappings[2].
> >>>
> >>> For at least a year now, we've known that the mappings were different,
> >>> but it wasn't clear if these unique mappings had correctness issues.
> >>>
> >>> Andrea Parri found an issue with the GCC mappings, showing that
> >>> atomic_compare_exchange_weak_explicit(-,-,-,release,relaxed)
> >>> mappings do
> >>> not enforce release ordering guarantees. (Meaning the GCC mappings have
> >>> a correctness issue).
> >>> https://inbox.sourceware.org/gcc-patches/Y1GbJuhcBFpPGJQ0@andrea/
> >> Right.  I recall this discussion, but thanks for the back reference.
> >
> > Yep, and it's an important one: that's why we're calling the change a
> > bug fix and dropping the current GCC mappings.  If we didn't have the
> > bug we'd be talking about an ABI break, and since the GCC mappings
> > predate the ISA mappings we'd likely need an additional compatibility
> > mode.
> >
> > So I guess we're lucky that we have a concurrency bug.  I think it's
> > the first time I've said that ;)
> >
> >>> Why not A.6?
> >>> ---------
> >>> We can update our mappings now, so the obvious choice would be to
> >>> implement Table A.6 (what LLVM implements/ISA manual recommends).
> >>>
> >>> The reason why that isn't the best path forward for GCC is due to a
> >>> proposal by Hans Boehm to add L{d|w|b|h}.aq/rl and S{d|w|b|h}.aq/rl.
> >>>
> >>> For context, there is discussion about fast-tracking the addition of
> >>> these instructions. The RISCV architectural review committee supports
> >>> adopting a "new and common atomics ABI for gcc and LLVM toochains ...
> >>> that assumes the addition of the preceding instructions”. That common
> >>> ABI is likely to be A.7.
> >>>    https://lists.riscv.org/g/tech-privileged/message/1284
> >>>
> >>> Transitioning from A.6 to A.7 will cause an ABI break. We can hedge
> >>> against that risk by emitting a conservative fence after SEQ_CST stores
> >>> to make the mapping compatible with both A.6 and A.7.
> >> So I like that we can have compatible sequences across A.6 and A.7.  Of
> >> course the concern is performance ;-)
> >>
> >>
> >>>
> >>> What does a mapping compatible with both A.6 & A.7 look like?
> >>> ---------
> >>> It is exactly the same as Table A.6, but SEQ_CST stores have a trailing
> >>> fence rw,rw. It's strictly stronger than Table A.6.
> >> Right.  So my worry here is silicon that is either already available or
> >> coming online shortly.   Those implementations simply aren't going to be
> >> able to use the A.7 mapping, so they pay a penalty.  Does it make sense
> >> to have the compatibility fences conditional?
> >
> > IIRC this was discussed somewhere in some thread, but I think there's
> > really three ABIs that could be implemented here (ignoring the current
> > GCC mappings as they're broken):
> >
> > * ABI compatible with the current mappings in the ISA manual (A.6).
> >  This will presumably perform best on extant hardware, given that it's
> >  what the words in the PDF say to do.
> > * ABI compatible with the proposed mappings for the ISA manual (A.7).
> >  This may perform better on new hardware.
> > * ABI compatible with both A.6 and A.7.  This is likely slow on both
> > new  and old hardware, but allows cross-linking.  If there's no
> > performance  issues this would be the only mode we need, but that
> > seems unlikely.
> >
> > IMO those should be encoded somewhere in the ELF.  I'd just do it as
> > two bits in the header, but last time I proposed header bits the psABI
> > folks wanted to do something different.  I don't think where we encode
> > this matters all that much, but if we're doing to treat these as real
> > long-term ABIs we should have some way to encode that.
> >
> > There's also the orthogonal axis of whether we use the new
> > instructions.  Those aren't in specs yet so I think we can hold off on
> > them for a bit, but they're the whole point of doing the ABI break so
> > we should at least think them over.  I think we're OK because we've
> > just split out the ABI from the ISA here, but I'm not sure if I'm
> > missing something.
> >
> > Now that I wrote that, though, I remember talking to Patrick about it
> > and we drew a bunch of stuff on the whiteboard and then got confused.
> > So sorry if I'm just out of the loop here...
> This looks up-to-date with how I understand it.
> >>
> >>>
> >>> Benchmark Interpretation
> >>> --------
> >>> As expected, out of order machines are significantly faster with the
> >>> REL_STORE mappings. Unexpectedly, the in-order machines are
> >>> significantly slower with REL_STORE rather than REL_STORE_FENCE.
> >> Yea, that's a bit of a surprise.
> >>
> >>>
> >>> Most machines in the wild are expected to use Table A.7 once the
> >>> instructions are introduced.
> >>> Incurring this added cost now will make it easier for compiled RISC-V
> >>> binaries to transition to the A.7 memory model mapping.
> >>>
> >>> The performance benefits of moving to A.7 can be more clearly seen
> >>> using
> >>> an almost-all-load microbenchmark (included on page 3 of Hans’
> >>> proposal). The code for that microbenchmark is attached below [5].
> >>>
> https://lists.riscv.org/g/tech-unprivileged/attachment/382/0/load-acquire110422.pdf
> >>>    https://lists.riscv.org/g/tech-unprivileged/topic/92916241
> >> Yea.  I'm not questioning the value of the new instructions that are on
> >> the horizon, just the value of trying to make everything A.7 compatible.
> >>
> >>
> >>>
> >>> Conformance test cases notes
> >>> --------
> >>> The conformance tests in this patch are a good sanity check but do not
> >>> guarantee exactly following Table A.6. It checks that the right
> >>> instructions are emitted (ex. fence rw,r) but not the order of those
> >>> instructions.
> >> Note there is a way to check ordering as well.  You might look at the
> >> check-function-bodies approach.  I think there are some recent examples
> >> in the gcc risc-v specific tests.
> I'll check that out - thank you!
> >>>
> >>> LLVM mapping notes
> >>> --------
> >>> LLVM emits corresponding fences for atomic_signal_fence instructions.
> >>> This seems to be an oversight since AFAIK atomic_signal_fence acts as a
> >>> compiler directive. GCC does not emit any fences for
> >>> atomic_signal_fence
> >>> instructions.
> >> This starts to touch on a larger concern.  Specifically I'd really like
> >> the two compilers to be compatible in terms of the code they generate
> >> for the various atomics.
> >>
> >> What I worry about is code being written (by design or accident) that is
> >> dependent on the particular behavior of one compiler and then if that
> >> code gets built with the other compiler, and we end up different
> >> behavior.  Worse yet, if/when this happens, it's likely to be tough to
> >> expose, reproduce & debug.
> Agreed.
>
> I'll open an issue with LLVM and see what they have to say about this
> particular behavior. Ideally we'd have perfectly compatible compilers
> (for atomic ops) by the end of this :)
>
> AFAICT GCC hasn't ever been emitting fences for these instructions.
> (& This behavior isn't touched by the patchset).
> >>
> >> Do you have any sense of where Clang/LLVM is going to go WRT providing
> >> an A.6 mapping that is compatible with A.7 by using the additional
> >> fences?
> I don't - Based on how LLVM initially waited for A.6, I'd speculate that
> they would wait for an official compatibility mapping to be added the
> PSABI doc or ISA manual before implementing it.
>
> I imagine there will be demand for an ABI-compatible mapping so the ABI
> break has a transition plan, but the timeline for LLVM isn't clear to me.
>
> Patrick
> >>
> >> Jeff
>

Reply via email to