Hi,

On Wed, Oct 28, 2020 at 12:34:58PM +0500, Andrey Borodin wrote:
Tomas, thanks for looking into this!

28 окт. 2020 г., в 06:36, Tomas Vondra <tomas.von...@2ndquadrant.com> 
написал(а):


This thread started with a discussion about making the SLRU sizes
configurable, but this patch version only adds a local cache. Does this
achieve the same goal, or would we still gain something by having GUCs
for the SLRUs?

If we're claiming this improves performance, it'd be good to have some
workload demonstrating that and measurements. I don't see anything like
that in this thread, so it's a bit hand-wavy. Can someone share details
of such workload (even synthetic one) and some basic measurements?

All patches in this thread aim at the same goal: improve performance in 
presence of MultiXact locks contention.
I could not build synthetical reproduction of the problem, however I did some 
MultiXact stressing here [0]. It's a clumsy test program, because it still is 
not clear to me which parameters of workload trigger MultiXact locks 
contention. In generic case I was encountering other locks like *GenLock: 
XidGenLock, MultixactGenLock etc. Yet our production system encounters this 
problem approximately once in a month through this year.

Test program locks for share different set of tuples in presence of concurrent 
full scans.
To produce a set of locks we choose one of 14 bits. If a row number has this 
bit set to 0 we add lock it.
I've been measuring time to lock all rows 3 time for each of 14 bits, observing 
total time to set all locks.
During the test I was observing locks in pg_stat_activity, if they did not 
contain enough MultiXact locks I was tuning parameters further (number of 
concurrent clients, number of bits, select queries etc).

Why is it so complicated? It seems that other reproductions of a problem were 
encountering other locks.


It's not my intention to be mean or anything like that, but to me this
means we don't really understand the problem we're trying to solve. Had
we understood it, we should be able to construct a workload reproducing
the issue ...

I understand what the individual patches are doing, and maybe those
changes are desirable in general. But without any benchmarks from a
plausible workload I find it hard to convince myself that:

(a) it actually will help with the issue you're observing on production

and
(b) it's actually worth the extra complexity (e.g. the lwlock changes)


I'm willing to invest some of my time into reviewing/testing this, but I
think we badly need better insight into the issue, so that we can build
a workload reproducing it. Perhaps collecting some perf profiles and a
sample of the queries might help, but I assume you already tried that.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to