Re: Suboptimal spinlock code due to volatile

Heikki Linnakangas Tue, 30 Jul 2024 00:46:11 -0700

On 29/07/2024 22:59, Andres Freund wrote:

After being confused for a while, the explanation is fairly simple: We use
volatile and dereference the address:


static __inline__ int
tas(volatile slock_t *lock)
{
        slock_t         _res = 1;

        __asm__ __volatile__(
                "  lock                    \n"
                "  xchgb   %0,%1   \n"
:               "+q"(_res), "+m"(*lock)
:               /* no inputs */
:               "memory", "cc");
        return (int) _res;
}

(note the (*lock) and the volatile in the signature).

I think it'd be just as defensible to not emit a separate load here, despite
the volatile, and indeed clang doesn't emit a separate load. But it also does
seem defensible to take translate the code very literally, as gcc does.


If I remove the volatile from the signature or cast it away, gcc indeed
generates the offset version:
     4230:      f0 86 82 c0 01 00 00    lock xchg %al,0x1c0(%rdx)


Good catch. Seems safe to just remove the volatile.

A second, even smaller, issue with the code is that we use "lock xchgb"
despite xchg having implied lock approximately forever ([2]). That makes the 
code
slightly wider than necessary (the lock prefix is one byte).


I doubt there's a lot of situations where these end up having a meaningful
performance impact, but it still seems suboptimal.   I may be seeing a *small*
gain in a workload inserting lots of tiny records, but it's hard to be sure if
it's above the noise floor.


I'm wondering in how many places our fairly broad use of volatiles causes
more substantially worse code being generated.

Aside from performance, I find "volatile" difficult to reason about. Ifeel more comfortable with atomics and memory barriers.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Suboptimal spinlock code due to volatile

Reply via email to