Hi Richard, > What I have not done, but is now a possibility, is to use a custom > calling convention for the out-of-line routines. I now only clobber > 2 (or 3, for TImode) temp regs and set a return value.
This would be a great feature to have since it reduces the overhead of outlining considerably. > I think this patch series would be great to have for GCC 10! Agreed. I've got a couple of general comments: * The option name -matomic-ool sounds too abbreviated. I think eg. -moutline-atomics is more descriptive and user friendlier. * Similarly the exported __aa64_have_atomics variable could be named __aarch64_have_lse_atomics so it's clear that it is about LSE atomics. +@item -matomic-ool +@itemx -mno-atomic-ool +Enable or disable calls to out-of-line helpers to implement atomic operations. +These helpers will, at runtime, determine if ARMv8.1-Atomics instructions +should be used; if not, they will use the load/store-exclusive instructions +that are present in the base ARMv8.0 ISA. + +This option is only applicable when compiling for the base ARMv8.0 +instruction set. If using a later revision, e.g. @option{-march=armv8.1-a} +or @option{-march=armv8-a+lse}, the ARMv8.1-Atomics instructions will be +used directly. So what is the behaviour when you explicitly select a specific CPU? +/* Branch to LABEL if LSE is enabled. + The branch should be easily predicted, in that it will, after constructors, + always branch the same way. The expectation is that systems that implement + ARMv8.1-Atomics are "beefier" than those that omit the extension. + By arranging for the fall-through path to use load-store-exclusive insns, + we aid the branch predictor of the smallest cpus. */ I'd say that by the time GCC10 is released and used in distros, systems without LSE atomics would be practically non-existent. So we should favour LSE atomics by default. Cheers, Wilco