We're certainly pushing for the same ABI (A.6 + trailing fence on store) in LLVM as well. I'm about to upload a pull request for the psABI document that describes this version of the ABI, and a bit of the rationale for it. I'll attach the current draft here.
I agree that compatibility is critical here, not just across llvm and gcc, but also with other language implementations. That's part of the reason to get this correct asap. I believe that standardizing on A.6 + trailing fence on store, though initially suboptimal, is by far the best bet to get us to an efficient ABI in the long term. I expect the A.7 ABI to perform well. A.6, even without the trailing store fence, has annoyingly expensive seq_cst loads, which I would really like to get away from. Hans On Fri, Apr 28, 2023 at 10:44 AM Patrick O'Neill <patr...@rivosinc.com> wrote: > On 4/28/23 09:29, Palmer Dabbelt wrote: > > On Fri, 28 Apr 2023 09:14:00 PDT (-0700), jeffreya...@gmail.com wrote: > >> > >> > >> On 4/27/23 10:22, Patrick O'Neill wrote: > >>> This patchset aims to make the RISCV atomics implementation stronger > >>> than the recommended mapping present in table A.6 of the ISA manual. > >>> > https://github.com/riscv/riscv-isa-manual/blob/c7cf84547b3aefacab5463add1734c1602b67a49/src/memory.tex#L1083-L1157 > >>> > >>> > >>> Context > >>> --------- > >>> GCC defined RISC-V mappings [1] before the Memory Model task group > >>> finalized their work and provided the ISA Manual Table A.6/A.7 > >>> mappings[2]. > >>> > >>> For at least a year now, we've known that the mappings were different, > >>> but it wasn't clear if these unique mappings had correctness issues. > >>> > >>> Andrea Parri found an issue with the GCC mappings, showing that > >>> atomic_compare_exchange_weak_explicit(-,-,-,release,relaxed) > >>> mappings do > >>> not enforce release ordering guarantees. (Meaning the GCC mappings have > >>> a correctness issue). > >>> https://inbox.sourceware.org/gcc-patches/Y1GbJuhcBFpPGJQ0@andrea/ > >> Right. I recall this discussion, but thanks for the back reference. > > > > Yep, and it's an important one: that's why we're calling the change a > > bug fix and dropping the current GCC mappings. If we didn't have the > > bug we'd be talking about an ABI break, and since the GCC mappings > > predate the ISA mappings we'd likely need an additional compatibility > > mode. > > > > So I guess we're lucky that we have a concurrency bug. I think it's > > the first time I've said that ;) > > > >>> Why not A.6? > >>> --------- > >>> We can update our mappings now, so the obvious choice would be to > >>> implement Table A.6 (what LLVM implements/ISA manual recommends). > >>> > >>> The reason why that isn't the best path forward for GCC is due to a > >>> proposal by Hans Boehm to add L{d|w|b|h}.aq/rl and S{d|w|b|h}.aq/rl. > >>> > >>> For context, there is discussion about fast-tracking the addition of > >>> these instructions. The RISCV architectural review committee supports > >>> adopting a "new and common atomics ABI for gcc and LLVM toochains ... > >>> that assumes the addition of the preceding instructions”. That common > >>> ABI is likely to be A.7. > >>> https://lists.riscv.org/g/tech-privileged/message/1284 > >>> > >>> Transitioning from A.6 to A.7 will cause an ABI break. We can hedge > >>> against that risk by emitting a conservative fence after SEQ_CST stores > >>> to make the mapping compatible with both A.6 and A.7. > >> So I like that we can have compatible sequences across A.6 and A.7. Of > >> course the concern is performance ;-) > >> > >> > >>> > >>> What does a mapping compatible with both A.6 & A.7 look like? > >>> --------- > >>> It is exactly the same as Table A.6, but SEQ_CST stores have a trailing > >>> fence rw,rw. It's strictly stronger than Table A.6. > >> Right. So my worry here is silicon that is either already available or > >> coming online shortly. Those implementations simply aren't going to be > >> able to use the A.7 mapping, so they pay a penalty. Does it make sense > >> to have the compatibility fences conditional? > > > > IIRC this was discussed somewhere in some thread, but I think there's > > really three ABIs that could be implemented here (ignoring the current > > GCC mappings as they're broken): > > > > * ABI compatible with the current mappings in the ISA manual (A.6). > > This will presumably perform best on extant hardware, given that it's > > what the words in the PDF say to do. > > * ABI compatible with the proposed mappings for the ISA manual (A.7). > > This may perform better on new hardware. > > * ABI compatible with both A.6 and A.7. This is likely slow on both > > new and old hardware, but allows cross-linking. If there's no > > performance issues this would be the only mode we need, but that > > seems unlikely. > > > > IMO those should be encoded somewhere in the ELF. I'd just do it as > > two bits in the header, but last time I proposed header bits the psABI > > folks wanted to do something different. I don't think where we encode > > this matters all that much, but if we're doing to treat these as real > > long-term ABIs we should have some way to encode that. > > > > There's also the orthogonal axis of whether we use the new > > instructions. Those aren't in specs yet so I think we can hold off on > > them for a bit, but they're the whole point of doing the ABI break so > > we should at least think them over. I think we're OK because we've > > just split out the ABI from the ISA here, but I'm not sure if I'm > > missing something. > > > > Now that I wrote that, though, I remember talking to Patrick about it > > and we drew a bunch of stuff on the whiteboard and then got confused. > > So sorry if I'm just out of the loop here... > This looks up-to-date with how I understand it. > >> > >>> > >>> Benchmark Interpretation > >>> -------- > >>> As expected, out of order machines are significantly faster with the > >>> REL_STORE mappings. Unexpectedly, the in-order machines are > >>> significantly slower with REL_STORE rather than REL_STORE_FENCE. > >> Yea, that's a bit of a surprise. > >> > >>> > >>> Most machines in the wild are expected to use Table A.7 once the > >>> instructions are introduced. > >>> Incurring this added cost now will make it easier for compiled RISC-V > >>> binaries to transition to the A.7 memory model mapping. > >>> > >>> The performance benefits of moving to A.7 can be more clearly seen > >>> using > >>> an almost-all-load microbenchmark (included on page 3 of Hans’ > >>> proposal). The code for that microbenchmark is attached below [5]. > >>> > https://lists.riscv.org/g/tech-unprivileged/attachment/382/0/load-acquire110422.pdf > >>> https://lists.riscv.org/g/tech-unprivileged/topic/92916241 > >> Yea. I'm not questioning the value of the new instructions that are on > >> the horizon, just the value of trying to make everything A.7 compatible. > >> > >> > >>> > >>> Conformance test cases notes > >>> -------- > >>> The conformance tests in this patch are a good sanity check but do not > >>> guarantee exactly following Table A.6. It checks that the right > >>> instructions are emitted (ex. fence rw,r) but not the order of those > >>> instructions. > >> Note there is a way to check ordering as well. You might look at the > >> check-function-bodies approach. I think there are some recent examples > >> in the gcc risc-v specific tests. > I'll check that out - thank you! > >>> > >>> LLVM mapping notes > >>> -------- > >>> LLVM emits corresponding fences for atomic_signal_fence instructions. > >>> This seems to be an oversight since AFAIK atomic_signal_fence acts as a > >>> compiler directive. GCC does not emit any fences for > >>> atomic_signal_fence > >>> instructions. > >> This starts to touch on a larger concern. Specifically I'd really like > >> the two compilers to be compatible in terms of the code they generate > >> for the various atomics. > >> > >> What I worry about is code being written (by design or accident) that is > >> dependent on the particular behavior of one compiler and then if that > >> code gets built with the other compiler, and we end up different > >> behavior. Worse yet, if/when this happens, it's likely to be tough to > >> expose, reproduce & debug. > Agreed. > > I'll open an issue with LLVM and see what they have to say about this > particular behavior. Ideally we'd have perfectly compatible compilers > (for atomic ops) by the end of this :) > > AFAICT GCC hasn't ever been emitting fences for these instructions. > (& This behavior isn't touched by the patchset). > >> > >> Do you have any sense of where Clang/LLVM is going to go WRT providing > >> an A.6 mapping that is compatible with A.7 by using the additional > >> fences? > I don't - Based on how LLVM initially waited for A.6, I'd speculate that > they would wait for an official compatibility mapping to be added the > PSABI doc or ISA manual before implementing it. > > I imagine there will be demand for an ABI-compatible mapping so the ABI > break has a transition plan, but the timeline for LLVM isn't clear to me. > > Patrick > >> > >> Jeff >