https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110592
Taylor R Campbell <campbell+gcc-bugzilla at mumble dot net> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |campbell+gcc-bugzilla@mumbl | |e.net --- Comment #5 from Taylor R Campbell <campbell+gcc-bugzilla at mumble dot net> --- (In reply to Eric Botcazou from comment #4) > Well, you need to elaborate a bit here, because the current configuration > has been there for a quarter of century and everybody had apparently > survived it until a couple of days ago. For most of that quarter century, memory ordering was limited to out-of-line barrier/fence subroutines implemented in assembly, like membar_sync in Solaris and NetBSD, or the thread-switch assembly routines in the kernel. It is only relatively recently, since C11 and C++11, that a lot of programs started using in-line barriers/fences and ordered memory operations like store-release/load-acquire. In that time, sparcv7 and sparcv8 haven't gotten a lot of attention, of course. But since they were introduced, NetBSD has had a common userland for sparcv7 and sparcv8, just called `NetBSD/sparc', with a special libc loaded on sparcv8 to use v8-only instructions like SMUL and UMUL for runtime multiplication subroutines to improve performance. (We could in principle do the same for LDSTUB in membar_sync on sparcv7, although we don't at the moment.) But now that programs rely on compiler-generated barriers, there's a conflict between gcc's v7 and v8 code generation: 1. `gcc -mcpu=v7' generates code that lacks LDSTUB where store-before-load barriers are needed, so anything that uses Dekker's algorithm with in-line barriers won't work correctly on a sparcv8 CPU (but it will only manifest in extremely rare, hard-to-diagnose scenarios, because Dekker's algorithm is so obscure). 2. `gcc -mcpu=v8' generates code that uses SMUL and UMUL and other instructions that don't exist on sparcv7. Evidently gcc can be made to generate SMUL/UMUL but omit LDSTUB barriers by using `gcc -mcpu=v8 -mmemory-model=sc', but the other way around doesn't work: `gcc -mcpu=v7 -mmemory-model=tso' still omits the LDSTUB barriers, because the code generation rules for barriers are all gated on TARGET_V8 || TARGET_V9. What we would like to do for NetBSD/sparc is use `-mcpu=v7 -mmemory-model=tso' -- that is, if it worked -- by default. The original submitter drafted a relatively small patch to achieve this, mostly by removing TARGET_V8 || TARGET_V9 conditionals or changing TARGET_V8 to !TARGET_V9 in membar-related code generation rules. But we'd also like to avoid diverging from gcc upstream. Could we convince you to take up an approach like this? Applications built to run on v7-only, of course, could omit the LDSTUBs by using `-mcpu=v7 -mmemory-model=sc' (or perhaps we could have the default be `-mcpu=v7 -mmemory-model=sc', but have bare `-mcpu=v7' imply `-mcpu=v7 -mmemory-model=sc' or something), and applications built to run on v8-only can still use `-mcpu=v8' to take advantage of `SMUL/UMUL'. I expect this would only affect a tiny fraction of programs in extremely rare scenarios -- those that actually rely on Dekker's algorithm (already rare), and hit problems with memory ordering (also rare, only under high contention), using in-line barriers or ordered memory operations (which wasn't the norm a quarter century ago when v7 and v8 were relevant). So you have to go out of your way to hit problems in practice, and any negative performance impact of the extra LDSTUBs on v7 CPUs that don't need them is likely negligible. But it's clear from code inspection and theory that the problem is there.