On 01/17/2017 09:00 AM, Torvald Riegel wrote:
I think the ABI should set a baseline for each architecture, and the
baseline decides whether something is inlinable or not. Thus, the
x86_64 ABI would make __int128 operations not imlinable (because of the
issues with cmpxchg16b, see above).
If users want to use capabilities beyond the baseline, they can choose
to use flags that alter/extend the ABI. For example, if they use a flag
that explicitly enables the use of cmpxchg16b for atomics, they also
need to use a libatomic implementation built in the same way (if
possible). This then creates a new ABI(-variant), basically.
Yes. Other examples here are power7/power8 and armv6/armv7.
In both cases, the architecture added double-word load(-locked) and
store(-conditional) instructions. In order for us to use these new
instructions inline, libatomic must be updated to use them as well.
The general principal, in my opinion, is that extensions to the ISA should
require that libatomic either be re-built, or perform runtime detection in
order to select the internal algorithm used.
In the case of arm, distributions normally either (1) build for a specific cpu
revision, (2) build for old-arm + soft-fpu, (3) build for armv7 + hard-fpu. So
most distributions would not actually require a runtime check for arm.
In the case of power, I assume it's possible to run ppc64 on power8, but every
power8 system to which I have access has ppc64le deployed. Certainly ppc64le
would not need a runtime check, but it would seem prudent for ppc64 to gain a
runtime check for the power8 insns.
I've made a few tests on my x86_64 machine a few weeks ago, and I didn't
see cmpxchg16b being used. IIRC, I also looked at libatomic and didn't
see it (but I don't remember for sure). Either way, if I should have
been wrong, and we are using cmpxchg16b for loads, this should be fixed.
Ideally, this should be fixed before the stage 3 deadline this Friday.
Such a fix might potentially break existing uses, but the earlier we fix
this, the better.
You needed to use -mcx16, or any other option (such as -march=host) that
implies that. And, you will find that expand_atomic_load does have a
larger-than-word-size fallback path that does use expand_atomic_compare_and_swap.
So, yes, there's something here that needs adjustment.
Section 3 Rationale, alternative 1: I'm wondering if the example is
correct. For a 4-byte-aligned type of size 3, the implementation cannot
simply use 4-byte hardware-backed atomics because this will inevitably
touch the 4th byte I think, and the implementation can't know whether
this is padding or not. Or do we expect that things like packed structs
are disallowed?
If we atomically store an unchanged value into the 4th byte, can we tell?
N3.1: Why do you assume that 8-byte HW atomics are available on i386?
Because cmpxchg8b is available for CPUs that are the lowest i?86 we
still intend to support?
For various definitions of "we", I suppose. Red Hat certainly does not support
anything lower than i686, which does have cmpxchg8b.
I suspect that the GNU project still supports i486. I do know that glibc has
dropped support for i386.
I should note that supporting 64-bit atomics on i686 *is* possible, without the
CAS problem that you describe for cmpxchg16b, because we *are* guaranteed that
the FPU supports a 64-bit atomic load/store. And we do already handle this;
see the atomic_loaddi_fpu and atomic_storedi_fpu patterns.
I'll also note that, as per above, this implies that if we build for i586-*,
libatomic should provide runtime paths that detect and use i686 insns, so that
the library is compatible with what the compiler will generate inline given
appropriate command-line options.
r~