Re: Remove last traces of HPPA support

Thomas Munro Wed, 31 Jul 2024 03:31:27 -0700

On Wed, Jul 31, 2024 at 8:47 PM Heikki Linnakangas <hlinn...@iki.fi> wrote:
> On 31/07/2024 08:52, Thomas Munro wrote:
> The old __i386__ implementation of TAS() said:
>
> >        * When this was last tested, we didn't have separate TAS() and 
> > TAS_SPIN()
> >        * macros.  Nowadays it probably would be better to do a non-locking 
> > test
> >        * in TAS_SPIN() but not in TAS(), like on x86_64, but no-one's done 
> > the
> >        * testing to verify that.  Without some empirical evidence, better to
> >        * leave it alone.
>
> It seems that you did what the comment suggested. That seems fine. For
> sake of completeness, if someone has an i386 machine lying around, it
> would be nice to verify that. Or an official CPU manufacturer's
> implementation guide, or references to other implementations or something.


Hmm, the last "real" 32 bit CPU is from ~20 years ago.  Now the only
32 bit x86 systems we should nominally care about are modern CPUs that
can also run 32 bit instruction; is there a reason to think they'd
behave differently at this level?  Looking at the current Intel
optimisation guide's discussion of spinlock implementation at page
2-34 of [1], it doesn't distinguish between 32 and 64, and it has that
double-check thing.

> > - *     On platforms with weak memory ordering, the TAS(), TAS_SPIN(), and
> > - *     S_UNLOCK() macros must further include hardware-level memory fence
> > - *     instructions to prevent similar re-ordering at the hardware level.
> > - *     TAS() and TAS_SPIN() must guarantee that loads and stores issued 
> > after
> > - *     the macro are not executed until the lock has been obtained.  
> > Conversely,
> > - *     S_UNLOCK() must guarantee that loads and stores issued before the 
> > macro
> > - *     have been executed before the lock is released.
>
> That old comment means that both SpinLockAcquire() and SpinLockRelease()
> acted as full memory barriers, and looking at the implementations, that
> was indeed so. With the new implementation, SpinLockAcquire() will have
> "acquire semantics" and SpinLockRelease will have "release semantics".
> That's very sensible, and I don't believe it will break anything, but
> it's a change in semantics nevertheless.

Yeah.  It's interesting that our pg_atomic_clear_flag(f) is like
standard atomic_flag_clear_explicit(f, memory_order_release), not like
atomic_flag_clear(f) which is short for atomic_flag_clear_explicit(f,
memory_order_seq_cst).  Example spinlock code I've seen written in
modern C or C++ therefore uses the _explicit variants, so it can get
acquire/release, which is what people usually want from a lock-like
thing.  What's a good way to test the performance in PostgreSQL?  In a
naive loop that just test-and-sets and clears a flag a billion times
in a loop and does nothing else, I see 20-40% performance increase
depending on architecture when comparing _seq_cst with
_acquire/_release.  You're right that this semantic change deserves
explicit highlighting, in comments somewhere...  I wonder if we have
anywhere that is counting on the stronger barrier...

[1] 
https://www.intel.com/content/www/us/en/content-details/671488/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html

Re: Remove last traces of HPPA support

Reply via email to