Hi, With LSE enabled by default a few failures in libgomp happen. The shortest testcase I came up with was: extern void abort (void); int x = 6; int f(void) __attribute__((noinline,noclone)); int f(void) { return 32; }
int main () { int v, l = 2, s = 1; x = f(); #pragma omp atomic capture v = x = 5 | x; if (v != 37) abort (); return 0; } --- CUT --- What happen was register allocator decided to use the same register for the input as the clobber register: Before: (insn 13 12 14 2 (parallel [ (set (reg:SI 74 [ _5 ]) (ior:SI (mem/v:SI (reg/f:DI 76) [-1 S4 A32]) (reg:SI 80))) (set (mem/v:SI (reg/f:DI 76) [-1 S4 A32]) (unspec_volatile:SI [ (mem/v:SI (reg/f:DI 76) [-1 S4 A32]) (reg:SI 80) (const_int 0 [0]) ] UNSPECV_ATOMIC_LDOP)) (clobber (scratch:SI)) ]) t.c:14 2895 {aarch64_atomic_or_fetchsi_lse} (expr_list:REG_DEAD (reg:SI 80) (expr_list:REG_DEAD (reg/f:DI 76) (nil)))) After: (insn 13 12 14 2 (parallel [ (set (reg:SI 2 x2 [orig:74 _5 ] [74]) (ior:SI (mem/v:SI (reg/f:DI 1 x1 [76]) [-1 S4 A32]) (reg:SI 0 x0 [80]))) (set (mem/v:SI (reg/f:DI 1 x1 [76]) [-1 S4 A32]) (unspec_volatile:SI [ (mem/v:SI (reg/f:DI 1 x1 [76]) [-1 S4 A32]) (reg:SI 0 x0 [80]) (const_int 0 [0]) ] UNSPECV_ATOMIC_LDOP)) (clobber (reg:SI 0 x0 [82])) ]) t.c:14 2895 {aarch64_atomic_or_fetchsi_lse} (nil)) And split came along and used the clobber register as a temporary to store the the result of the ldset and then did an or with that register and the original input register. This is incorrect as the clobber register needs to be marked as early clobber so it does not match up with the input register. This obvious patch fixes the problem by marking the clobber register as an early clobber so the register allocator does not choose the same register as an input register. Committed as obvious after a bootstrap/test on aarch64-linux-gnu configured with/without --with-cpu=thunderx+lse on a pass 2 ThunderX CPU (which has ARMv8.1 support). Thanks, Andrew ChangeLog: 2015-12-20 Andrew Pinsi <apin...@cavium.com> * config/aarch64/atomics.md (aarch64_atomic_<atomic_optab>_fetch<mode>_lse): Add early clobber to the scratch register.
Index: config/aarch64/atomics.md =================================================================== --- config/aarch64/atomics.md (revision 231852) +++ config/aarch64/atomics.md (working copy) @@ -428,7 +428,7 @@ (match_dup 2) (match_operand:SI 3 "const_int_operand")] UNSPECV_ATOMIC_LDOP)) - (clobber (match_scratch:ALLI 4 "=r"))] + (clobber (match_scratch:ALLI 4 "=&r"))] "TARGET_LSE" "#" "&& reload_completed"