https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110592

Taylor R Campbell <campbell+gcc-bugzilla at mumble dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |campbell+gcc-bugzilla@mumbl
                   |                            |e.net

--- Comment #5 from Taylor R Campbell <campbell+gcc-bugzilla at mumble dot net> 
---
(In reply to Eric Botcazou from comment #4)
> Well, you need to elaborate a bit here, because the current configuration
> has been there for a quarter of century and everybody had apparently
> survived it until a couple of days ago.

For most of that quarter century, memory ordering was limited to out-of-line
barrier/fence subroutines implemented in assembly, like membar_sync in Solaris
and NetBSD, or the thread-switch assembly routines in the kernel.

It is only relatively recently, since C11 and C++11, that a lot of programs
started using in-line barriers/fences and ordered memory operations like
store-release/load-acquire.

In that time, sparcv7 and sparcv8 haven't gotten a lot of attention, of course.

But since they were introduced, NetBSD has had a common userland for sparcv7
and sparcv8, just called `NetBSD/sparc', with a special libc loaded on sparcv8
to use v8-only instructions like SMUL and UMUL for runtime multiplication
subroutines to improve performance.  (We could in principle do the same for
LDSTUB in membar_sync on sparcv7, although we don't at the moment.)

But now that programs rely on compiler-generated barriers, there's a conflict
between gcc's v7 and v8 code generation:

1. `gcc -mcpu=v7' generates code that lacks LDSTUB where store-before-load
barriers are needed, so anything that uses Dekker's algorithm with in-line
barriers won't work correctly on a sparcv8 CPU (but it will only manifest in
extremely rare, hard-to-diagnose scenarios, because Dekker's algorithm is so
obscure).

2. `gcc -mcpu=v8' generates code that uses SMUL and UMUL and other instructions
that don't exist on sparcv7.

Evidently gcc can be made to generate SMUL/UMUL but omit LDSTUB barriers by
using `gcc -mcpu=v8 -mmemory-model=sc', but the other way around doesn't work:
`gcc -mcpu=v7 -mmemory-model=tso' still omits the LDSTUB barriers, because the
code generation rules for barriers are all gated on TARGET_V8 || TARGET_V9.

What we would like to do for NetBSD/sparc is use `-mcpu=v7 -mmemory-model=tso'
-- that is, if it worked -- by default.  The original submitter drafted a
relatively small patch to achieve this, mostly by removing TARGET_V8 ||
TARGET_V9 conditionals or changing TARGET_V8 to !TARGET_V9 in membar-related
code generation rules.  But we'd also like to avoid diverging from gcc
upstream.  Could we convince you to take up an approach like this?

Applications built to run on v7-only, of course, could omit the LDSTUBs by
using `-mcpu=v7 -mmemory-model=sc' (or perhaps we could have the default be
`-mcpu=v7 -mmemory-model=sc', but have bare `-mcpu=v7' imply `-mcpu=v7
-mmemory-model=sc' or something), and applications built to run on v8-only can
still use `-mcpu=v8' to take advantage of `SMUL/UMUL'.

I expect this would only affect a tiny fraction of programs in extremely rare
scenarios -- those that actually rely on Dekker's algorithm (already rare), and
hit problems with memory ordering (also rare, only under high contention),
using in-line barriers or ordered memory operations (which wasn't the norm a
quarter century ago when v7 and v8 were relevant).  So you have to go out of
your way to hit problems in practice, and any negative performance impact of
the extra LDSTUBs on v7 CPUs that don't need them is likely negligible.  But
it's clear from code inspection and theory that the problem is there.

Reply via email to