https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069
--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Hongyu Wang <hong...@gcc.gnu.org>: https://gcc.gnu.org/g:0435b978f95971e139882549f5a1765c50682216 commit r12-7316-g0435b978f95971e139882549f5a1765c50682216 Author: Hongyu Wang <hongyu.w...@intel.com> Date: Fri Feb 11 14:44:15 2022 +0800 i386: Relax cmpxchg instruction under -mrelax-cmpxchg-loop [PR103069] For cmpxchg, it is commonly used in spin loop, and several user code such as pthread directly takes cmpxchg as loop condition, which cause huge cache bouncing. This patch extends previous implementation to relax all cmpxchg instruction under -mrelax-cmpxchg-loop with an extra atomic load, compare and emulate the failed cmpxchg behavior. For original spin loop which looks like loop: mov %eax,%r8d or $1,%r8d lock cmpxchg %r8d,(%rdi) jne loop It will now truns to loop: mov %eax,%r8d or $1,%r8d mov (%r8),%rsi <--- load lock first cmp %rsi,%rax <--- compare with expected input jne .L2 <--- lock ne expected lock cmpxchg %r8d,(%rdi) jne loop L2: mov %rsi,%rax <--- perform the behavior of failed cmpxchg jne loop under -mrelax-cmpxchg-loop. gcc/ChangeLog: PR target/103069 * config/i386/i386-expand.cc (ix86_expand_atomic_fetch_op_loop): Split atomic fetch and loop part. (ix86_expand_cmpxchg_loop): New expander for cmpxchg loop. * config/i386/i386-protos.h (ix86_expand_cmpxchg_loop): New prototype. * config/i386/sync.md (atomic_compare_and_swap<mode>): Call new expander under TARGET_RELAX_CMPXCHG_LOOP. (atomic_compare_and_swap<mode>): Likewise for doubleword modes. gcc/testsuite/ChangeLog: PR target/103069 * gcc.target/i386/pr103069-2.c: Adjust result check. * gcc.target/i386/pr103069-3.c: New test. * gcc.target/i386/pr103069-4.c: Likewise.