Improving LWLock wait events

Andres Freund Sun, 20 Dec 2020 13:28:12 -0800

Hi,

The current wait events are already pretty useful. But I think we could
make them more informative without adding real runtime overhead.



1) For lwlocks I think it'd be quite useful to show the mode of acquisition in
pg_stat_activity.wait_event_type, instead of just saying 'LWLock'.

I think we should split PG_WAIT_LWLOCK into
PG_WAIT_LWLOCK_{EXCLUSIVE,SHARED,WAIT_UNTIL_FREE}, and report a different
wait_event_type based on the different class.

The fact that it'd break people explicitly looking for LWLock in
pg_stat_activity doesn't seem to outweigh the benefits to me.


2) I think it's unhelpful that waits for WAL insertion locks to progress show
up LWLock acquisitions. LWLockWaitForVar() feels like a distinct enough
operation that passing in a user-specified wait event is worth the miniscule
incremental overhead that'd add.

I'd probably just make it a different wait class, and have xlog.c compute that
based on the number of the slot being waited for.


3) I have observed waking up other processes as part of a lock release to be a
significant performance factor. I would like to add a separate wait event type
for that. That'd be a near trivial extension to 1)


I also think there's a 4, but I think the tradeoffs are a bit more
complicated:

4) For a few types of lwlock just knowing the tranche isn't
sufficient. E.g. knowing whether it's one or different buffer mapping locks
are being waited on is important to judge contention.

For wait events right now we use 1 byte for the class, 1 byte is unused and 2
bytes are used for event specific information (the tranche in case of
lwlocks).

Seems like we could change the split to be a 4bit class and leave 28bit to the
specific wait event type? And in lwlocks case we could make something like 4
bit class, 10 bit tranche, 20 bit sub-tranche?

20 bit aren't enough to uniquely identify a lock for the larger tranches
(mostly buffer locks, I think), but I think it'd still be enough to
disambiguate.

The hardest part would be to know how to identify individual locks. The
easiest would probably be to just mask in a parts of the lwlock address
(e.g. shift it right by INTALIGN, and then mask in the result into the
eventId). That seems a bit unsatisfying.

We could probably do a bit better: We could just store the information about
tranche / offset within tranche at LWLockInitialize() time, instead of
computing something just before waiting.  While LWLock.tranche is only 16bits
right now, the following two bytes are currently padding...

That'd allow us to have proper numerical identification for nearly all
tranches, without needing to go back to the complexity of having tranches
specify base & stride.

Even more API churn around lwlock initialization isn't desirable :(, but we
could just add a LWLockInitializeIdentified() or such.


Greetings,

Andres Freund

Improving LWLock wait events

Reply via email to