On 10/11/22 18:15, Palmer Dabbelt wrote:
Sorry, I thought we'd talked about it somewhere but it must have just
been in meetings and such. Patrick was writing a similar patch set
around the same time so it probably just got tied up in that, we ended
up reducing it to just the strong CAS inline stuff because we couldn't
sort out the correctness of the rest of it.
Now that you mention it, I vaguely recall a discussion about inline
atomics vs libatomic and the discussion on this issue might have been
there rather than in Christophe's patchset.
My initial understanding was that fixing something broken cannot be
an ABI break.
And that the mismatch of the implementation in 2021 and the
recommended mappings in the ratified specification from 2019 is
something that is broken. I still don't know the background here,
but I guess this assumption is incorrect from a historical point of
view.
We agreed that we wouldn't break binaries back when we submitted the
port. The ISA has changed many times since then, including adding the
recommended mappings, but those binaries exist and we can't just
silently break things for users.
That may be too strong of a policy -- just because something worked in
the past doesn't mean it must always work. It really depends on the
contracts specified by the ABI, processor reference documentation, etc
and whether or not the code relies on something that it outside those
contracts. If it does rely on behavior outside the contracts, then we
shouldn't be constrained by such an agreement.
Another way to think about this problem is do we want more code making
incorrect assumptions about the behavior of atomics getting out in the
wild? My take would be that we nip this in the bud, get it right now in
the default configuration, but leave enough bits in place that existing
code continues to work.
However, I'm sure that I am not the only one that assumes the
mappings in the specification to be implemented in compilers and
tools. Therefore I still consider the implementation of the RISC-V
atomics in GCC as broken (at least w.r.t. user expectation from
people that lack the historical background and just read the RISC-V
specification).
You can't just read one of those RISC-V PDFs and assume that
implementations that match those words will function correctly. Those
words regularly change in ways where reasonable readers would end up
with incompatible implementations due to those differences. That's
why we're so explicit about versions and such these days, we're just
getting burned by these old mappings because they're from back when we
though the RISC-V definition of compatibility was going to match the
more common one and we didn't build in fallbacks.
Fair point, but in my mind that argues that the platform must mature
further so that the contracts can be relied upon. That obviously needs
to get fixed and until it does any agreements or guarantees about
behavior of existing code can't be reasonably made. If we're going to
be taken seriously, then those fundamentals have to be rock solid.
I don't think we're just stuck with the status quo, we really just
need to go through the mappings and figure out which can be made both
fast and ABI-compatible. Then we can fix those and see where we
stand, maybe it's good enough or maybe we need to introduce some sort
of compatibility break to make things faster (and/or compatible with
LLVM, where I suspect we're broken right now).
Certainly seems like a good first step. What we can fix without
breaking things we do while we sort out the tougher problems.
If we do need a break then I think it's probably possible to do it in
phases, where we have a middle-ground compatibility mode that works
for both the old and new mappings so distros can gradually move over
as they rebuild packages.
As someone that lived in the distro space for a long time, I would argue
that now is the time to fix this stuff -- before there is a large uptake
in distro consumption.
+Jeff, who was offering to help when the threads got crossed. I'd
punted on a lot of this in the hope Andrea could help out, as I'm not
really a memory model guy and this is pretty far down the rabbit
hole. Happy to have the help if you're offering, though, as what's
there is likely a pretty big performance issue for anyone with a
reasonable memory system.
Hmm, there's a case I'm pondering if I can discuss or not. Probably not
since I can't recall it ever being discussed in public. So I'll just
say this space can be critically important for performance and the
longer we wait, the tougher it gets to fix without causing significant
disruption.
Jeff