https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110096

andysem at mail dot ru changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andysem at mail dot ru

--- Comment #11 from andysem at mail dot ru ---
(In reply to Andrew Pinski from comment #10)
> (In reply to Peter Dimov from comment #9)
> > I don't think I want WFE here, based on what I read about it. Putting the
> > core to sleep seems like something to do in an embedded system where I have
> > full control of what cores do, not something to do on the application level,
> > in a portable C++ library.
> 
> No, WFE can be used in userspace just fine and in fact it will be
> interrupted every once in a while. yield only sleeps for a few cycles and
> then wakes up, while wfe will sleep until an event happens (also WFE is very
> hypervisor friendly too).

Spin locks are used when latency is a concern and when the protected region is
extremely small (i.e. a few instructions). Putting the core to sleep until the
next interrupt does not seem appropriate for this purpose. x86 pause and ARM
yield are better suited exactly because they wait for a time in the order of
cycles (up to about a hundred on recent x86) rather than microseconds or more.

I can add that in most spin lock implementations I have seen, either yield, nop
or nothing is used for wasting CPU cycles. A few examples:

https://www.boost.org/doc/libs/1_82_0/boost/fiber/detail/cpu_relax.hpp
https://elixir.bootlin.com/linux/latest/source/arch/arm/include/asm/vdso/processor.h#L11
https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/vdso/processor.h#L12
https://chromium.googlesource.com/chromium/src/third_party/WebKit/Source/wtf/+/823d62cdecdbd5f161634177e130e5ac01eb7b48/SpinLock.cpp

The first link has instructions for a few other architectures besides ARM and
x86. I agree with Peter that an architecture-neutral intrinsic could be useful
to avoid this kind of code duplicated in various projects. Although I realize
that specifying the exact behavior of this intrinsic would be difficult, since
even the underlying instructions are defined rather vaguely. However, one thing
is certain: this intrinsic must be a full compiler fence.

Reply via email to