https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110096
andysem at mail dot ru changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |andysem at mail dot ru --- Comment #11 from andysem at mail dot ru --- (In reply to Andrew Pinski from comment #10) > (In reply to Peter Dimov from comment #9) > > I don't think I want WFE here, based on what I read about it. Putting the > > core to sleep seems like something to do in an embedded system where I have > > full control of what cores do, not something to do on the application level, > > in a portable C++ library. > > No, WFE can be used in userspace just fine and in fact it will be > interrupted every once in a while. yield only sleeps for a few cycles and > then wakes up, while wfe will sleep until an event happens (also WFE is very > hypervisor friendly too). Spin locks are used when latency is a concern and when the protected region is extremely small (i.e. a few instructions). Putting the core to sleep until the next interrupt does not seem appropriate for this purpose. x86 pause and ARM yield are better suited exactly because they wait for a time in the order of cycles (up to about a hundred on recent x86) rather than microseconds or more. I can add that in most spin lock implementations I have seen, either yield, nop or nothing is used for wasting CPU cycles. A few examples: https://www.boost.org/doc/libs/1_82_0/boost/fiber/detail/cpu_relax.hpp https://elixir.bootlin.com/linux/latest/source/arch/arm/include/asm/vdso/processor.h#L11 https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/vdso/processor.h#L12 https://chromium.googlesource.com/chromium/src/third_party/WebKit/Source/wtf/+/823d62cdecdbd5f161634177e130e5ac01eb7b48/SpinLock.cpp The first link has instructions for a few other architectures besides ARM and x86. I agree with Peter that an architecture-neutral intrinsic could be useful to avoid this kind of code duplicated in various projects. Although I realize that specifying the exact behavior of this intrinsic would be difficult, since even the underlying instructions are defined rather vaguely. However, one thing is certain: this intrinsic must be a full compiler fence.