[Bug target/80878] -mcx16 (enable 128 bit CAS) on x86_64 seems not to work on 7.1.0

lh_mouse at 126 dot com via Gcc-bugs Sun, 10 Dec 2023 01:54:25 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878


--- Comment #42 from LIU Hao <lh_mouse at 126 dot com> ---
(In reply to Yongwei Wu from comment #27)
> Anyone can show a valid use case for a non-lock-free version of 128-bit
> atomic_compare_exchange?
> 
> I am trying to use it in a data structure intended to be lock-free. I am
> surprised to find that the C++ std::atomic::compare_exchange_weak does not
> result in lock-free code for a 128-bit struct intended for ABA-free CAS. As
> a result, the GCC-generated code is MUCH slower than the mutex-based version
> in my 8-thread contention test, defeating all its valid purposes. I am
> talking about a 10x difference. And the Clang-generated code is more than
> 200x faster in the same test.

[I think this is off topic though.]

I tested CMPXCHG16B with inline assembly on an i7-1165G7 (Dell XPS 13 9305) and
it turned out to be much slower than CMPXCHG, even slower than a pair of calls
to `pthread_mutex_lock()` and unlock. Similar results were observed on a
desktop i7 11700 and a server Xeon Cascadelake. The performance degeneration
might be caused by more μops, more locking work for the extra width of
operands, and more cache synchronization, which makes some sense if we assume
the CPU should be optimized mostly for 8-byte access.

The conclusion is probably that 16-byte compare-and-swap isn't recommended.

[Bug target/80878] -mcx16 (enable 128 bit CAS) on x86_64 seems not to work on 7.1.0

Reply via email to