https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113280
Bug ID: 113280
Summary: Strange error for empty inline assembly with +X
constraint
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: david at westcontrol dot com
Target Milestone: ---
I am getting strange results with an empty inline assembly line, used as an
optimisation barrier. This is the test code I've tried in different setups:
#define opt 1
typedef int T;
T test(T a, T b) {
T x = a + b;
#if opt == 1
asm ("" : "+X" (x));
#endif
#if opt == 2
asm ("" : "+X" (x) : "0" (x));
#endif
#if opt == 3
asm ("" :: "X" (x));
asm ("" : "+X" (x));
#endif
return x - b;
}
The idea of the assembly line is to be an optimisation barrier - it forces the
compiler to calculate the value of "x" before the assembly line, and has to use
that value (rather than any pre-calculated information) after the assembly
line. This makes it something like an "asm("" ::: "memory")" barrier, but
focused on a single variable, which may not need to be moved out of registers.
I've found this kind of construct "asm("" : "+g" (x))" useful to control
ordering of code around disabled interrupt sections of embedded code, and it
can also force the order of floating point calculations even while -ffast-math
is used. Usually I've used "g" as the operand constraint, and that has worked
perfectly for everything except floating point types on targets that have
dedicated floating point registers.
I had expected "+X" (option 1 above) to work smoothly for floating point types.
And it does, sometimes - other times I get an error:
error: inconsistent operand constraints in an 'asm'
I've tested mainly on x86-64 (current trunk, which will be 14.0) and ARM 32-bit
(13.2), with -O2 optimisation. I get no errors with -O0, but there is no need
of an optimisation barrier if optimisation is not enabled! The exact compiler
version does not seem to make a difference, but the target does, as does the
type T.
I can't see why there should be a difference in the way these three variations
work, or why some combinations give errors. When there is no compiler error,
the code (in all my testing) has been optimal as intended.
Some test results:
T = float :
opt 1 arm-32 fail x86-64 fail
opt 2 arm-32 fail x86-64 ok
opt 3 arm-32 ok x86-64 ok
T = int :
opt 1 arm-32 fail x86-64 fail
opt 2 arm-32 fail x86-64 ok
opt 3 arm-32 ok x86-64 ok
T = double :
opt 1 arm-32 fail x86-64 fail
opt 2 arm-32 fail x86-64 ok
opt 3 arm-32 ok x86-64 ok
T = long long int :
opt 1 arm-32 ok x86-64 fail
opt 2 arm-32 ok x86-64 ok
opt 3 arm-32 ok x86-64 ok
T = _Complex double :
opt 1 arm-32 ok x86-64 ok
opt 2 arm-32 fail x86-64 fail
opt 3 arm-32 ok x86-64 ok
T = long double :
(arm-32 has 64-bit long double, so the results are not relevant)
opt 1 gives "inconsistent operand constraints in an asm"
opt 2 and 3 give "output constraint 0 must specify a single register", which
is an understandable error.
It looks like the "+X" constraint only works reliably if preceded by the
assembly line with the "X" input constraint. For my own use, I can live with
that - it all goes in a macro anyway. But maybe it points to a deeper issue,
or at least a potential for neatening things up.