Re: scalability bottlenecks with (many) partitions (and more)

2025-03-04 Thread Tom Lane
Andrew Dunstan writes: > On 2025-03-04 Tu 5:28 PM, Tom Lane wrote: >> ... I eventually concluded that there's >> something wrong with the "scalar glob()" idiom you used. > Well, in scalar context it should give us back the first item found, or > undef if nothing is found, AIUI. That's what I wo

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-04 Thread Andrew Dunstan
On 2025-03-04 Tu 6:04 PM, Tom Lane wrote: Andrew Dunstan writes: Will check your patch out too. Comparing previous run against current, I now see that my patch caused it to skip these steps: module-ldap_password_func-check module-pg_bsd_indent-check contrib-sepgsql-check Skipping the ldap an

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-04 Thread Tom Lane
Andrew Dunstan writes: > I'm looking at something else, namely the attached. Yeah, that avoids the extra installs and brings sifaka's runtime back to about what it had been. regards, tom lane

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-04 Thread Andrew Dunstan
On 2025-03-04 Tu 5:28 PM, Tom Lane wrote: Andrew Dunstan writes: I think I found a logic bug. Testing. Not sure what you are looking at, but I was trying to fix it by making the loop over test modules skip unbuilt modules, borrowing the test you added in v19 to skip unbuilt contrib modules.

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-04 Thread Tom Lane
Andrew Dunstan writes: > Will check your patch out too. Comparing previous run against current, I now see that my patch caused it to skip these steps: module-ldap_password_func-check module-pg_bsd_indent-check contrib-sepgsql-check Skipping the ldap and sepgsql tests is desirable, but it sho

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-04 Thread Andrew Dunstan
On 2025-03-04 Tu 5:28 PM, Tom Lane wrote: Andrew Dunstan writes: I think I found a logic bug. Testing. Not sure what you are looking at, but I was trying to fix it by making the loop over test modules skip unbuilt modules, borrowing the test you added in v19 to skip unbuilt contrib modules. I

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-04 Thread Tom Lane
Andrew Dunstan writes: >> I think I found a logic bug. Testing. Oh! I bet you are looking at this 18-to-19 diff: @@ -416,7 +416,8 @@ sub check_install_is_complete { $tmp_loc = "$tmp_loc/$install_dir"; $bindir = "$tmp_loc/bin"; - $libdir = "$

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-04 Thread Tom Lane
Andrew Dunstan writes: > I think I found a logic bug. Testing. Not sure what you are looking at, but I was trying to fix it by making the loop over test modules skip unbuilt modules, borrowing the test you added in v19 to skip unbuilt contrib modules. It's a little more complicated for the other

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-04 Thread Andrew Dunstan
On 2025-03-04 Tu 5:01 PM, Tom Lane wrote: Andres Freund writes: On 2025-03-04 16:30:34 -0500, Tom Lane wrote: Yeah, I've been poking at that. It's not at all clear why the animal is trying to run src/test/modules/ldap_password_func now when it didn't before. It did do so before as well, af

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-04 Thread Tom Lane
Andres Freund writes: > On 2025-03-04 16:30:34 -0500, Tom Lane wrote: >> Yeah, I've been poking at that. It's not at all clear why the >> animal is trying to run src/test/modules/ldap_password_func >> now when it didn't before. > It did do so before as well, afaict: > https://buildfarm.postgresq

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-04 Thread Andres Freund
Hi, On 2025-03-04 16:30:34 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2025-03-04 19:58:38 +0100, Tomas Vondra wrote: > >> I noticed sifaka started failing right after I pushed this: > > > It's worth noting that > > a) sifaka doesn't build with ldap support > > b) the failure is in che

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-04 Thread Tom Lane
Andres Freund writes: > On 2025-03-04 19:58:38 +0100, Tomas Vondra wrote: >> I noticed sifaka started failing right after I pushed this: > It's worth noting that > a) sifaka doesn't build with ldap support > b) the failure is in checkprep, not when running the tests > c) the buildfarm unfortunate

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-04 Thread Tomas Vondra
On 3/4/25 15:38, Tomas Vondra wrote: > > ... > >>> >>> Attached is a patch doing this, but considering it has nothing to do >>> with the shmem sizing, I wonder if it's worth it. >> >> Yes. >> > > OK, barring objections I'll push the v2. > Pushed, with the tweaks to use FastPathLockSlotsPerBacke

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-04 Thread Andres Freund
Hi, On 2025-03-04 19:58:38 +0100, Tomas Vondra wrote: > Pushed, with the tweaks to use FastPathLockSlotsPerBackend() in a couple > more places. Thanks! > I noticed sifaka started failing right after I pushed this: > > https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=sifaka&br=master

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-04 Thread Tomas Vondra
On 3/4/25 14:11, Andres Freund wrote: > Hi, > > On 2025-03-04 14:05:22 +0100, Tomas Vondra wrote: >> On 3/3/25 21:52, Andres Freund wrote: It's not a proper constant, of course, but it seemed close enough. Yes, it might confuse people into thinking it's a constant, or is there som

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-04 Thread Andres Freund
Hi, On 2025-03-04 14:05:22 +0100, Tomas Vondra wrote: > On 3/3/25 21:52, Andres Freund wrote: > >> It's not a proper constant, of course, but it seemed close > >> enough. Yes, it might confuse people into thinking it's a constant, or > >> is there some additional impact? > > > > That seems plenty

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-04 Thread Tomas Vondra
On 3/3/25 21:52, Andres Freund wrote: > Hi, > > On 2025-03-03 21:31:42 +0100, Tomas Vondra wrote: >> On 3/3/25 19:10, Andres Freund wrote: >>> On 2024-09-21 20:33:49 +0200, Tomas Vondra wrote: I've finally pushed this, after many rounds of careful testing to ensure no regressions, and

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-03 Thread Andres Freund
Hi, On 2025-03-03 21:31:42 +0100, Tomas Vondra wrote: > On 3/3/25 19:10, Andres Freund wrote: > > On 2024-09-21 20:33:49 +0200, Tomas Vondra wrote: > >> I've finally pushed this, after many rounds of careful testing to ensure > >> no regressions, and polishing. > > > > One minor nit: I don't like

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-03 Thread Tomas Vondra
On 3/3/25 19:10, Andres Freund wrote: > Hi, > > On 2024-09-21 20:33:49 +0200, Tomas Vondra wrote: >> I've finally pushed this, after many rounds of careful testing to ensure >> no regressions, and polishing. > > One minor nit: I don't like that FP_LOCK_SLOTS_PER_BACKEND is now non-constant > w

Re: scalability bottlenecks with (many) partitions (and more)

2025-03-03 Thread Andres Freund
Hi, On 2024-09-21 20:33:49 +0200, Tomas Vondra wrote: > I've finally pushed this, after many rounds of careful testing to ensure > no regressions, and polishing. One minor nit: I don't like that FP_LOCK_SLOTS_PER_BACKEND is now non-constant while looking like a constant: #define FP_LOCK_

Re: scalability bottlenecks with (many) partitions (and more)

2024-11-20 Thread Matthias van de Meent
On Wed, 4 Sept 2024 at 17:32, Tomas Vondra wrote: > > On 9/4/24 16:25, Matthias van de Meent wrote: > > On Tue, 3 Sept 2024 at 18:20, Tomas Vondra wrote: > >> FWIW the actual cost is somewhat higher, because we seem to need ~400B > >> for every lock (not just the 150B for the LOCK struct). > > >

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-23 Thread Tom Lane
Tomas Vondra writes: > Thanks. Pushed a fix for these issues, hopefully coverity will be happy. Thanks. > BTW is the coverity report accessible somewhere? I know someone > mentioned that in the past, but I don't recall the details. Maybe we > should have a list of all these resources, useful for

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-23 Thread Tomas Vondra
On 9/23/24 01:06, Tom Lane wrote: > Tomas Vondra writes: >> On 9/22/24 17:45, Tom Lane wrote: >>> #define FAST_PATH_GROUP(index) \ >>> - (AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \ >>> + (AssertMacro((uint32) (index) < FP_LOCK_SLOTS_PER_BACKEND), \ >>> ((index

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-23 Thread Jakub Wartak
On Mon, Sep 16, 2024 at 4:19 PM Tomas Vondra wrote: > On 9/16/24 15:11, Jakub Wartak wrote: > > On Fri, Sep 13, 2024 at 1:45 AM Tomas Vondra wrote: > > > >> [..] > > > >> Anyway, at this point I'm quite happy with this improvement. I didn't > >> have any clear plan when to commit this, but I'm co

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-22 Thread Tom Lane
Tomas Vondra writes: > On 9/22/24 17:45, Tom Lane wrote: >> #define FAST_PATH_GROUP(index) \ >> -(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \ >> +(AssertMacro((uint32) (index) < FP_LOCK_SLOTS_PER_BACKEND), \ >> ((index) / FP_LOCK_SLOTS_PER_GROUP)) > For t

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-22 Thread Tomas Vondra
On 9/22/24 17:45, Tom Lane wrote: > Tomas Vondra writes: >> I've finally pushed this, after many rounds of careful testing to ensure >> no regressions, and polishing. > > Coverity is not terribly happy with this. "Assert(fpPtr = fpEndPtr);" > is very clearly not doing what you presumably intende

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-22 Thread Tom Lane
Tomas Vondra writes: > I've finally pushed this, after many rounds of careful testing to ensure > no regressions, and polishing. Coverity is not terribly happy with this. "Assert(fpPtr = fpEndPtr);" is very clearly not doing what you presumably intended. The others look like overaggressive asse

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-22 Thread Tomas Vondra
On 9/22/24 10:50, Ants Aasma wrote: > On Sat, 21 Sept 2024 at 21:33, Tomas Vondra wrote: >> I've finally pushed this, after many rounds of careful testing to ensure >> no regressions, and polishing. All changes since the version shared on >> September 13 are only cosmetic - renaming a macro to

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-22 Thread Ants Aasma
On Sat, 21 Sept 2024 at 21:33, Tomas Vondra wrote: > I've finally pushed this, after many rounds of careful testing to ensure > no regressions, and polishing. All changes since the version shared on > September 13 are only cosmetic - renaming a macro to keep it consistent > with the other ones, cl

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-21 Thread Tomas Vondra
Hi, I've finally pushed this, after many rounds of careful testing to ensure no regressions, and polishing. All changes since the version shared on September 13 are only cosmetic - renaming a macro to keep it consistent with the other ones, clarifying a couple comments etc. Nothing major. I ended

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-16 Thread Tomas Vondra
On 9/16/24 15:11, Jakub Wartak wrote: > On Fri, Sep 13, 2024 at 1:45 AM Tomas Vondra wrote: > >> [..] > >> Anyway, at this point I'm quite happy with this improvement. I didn't >> have any clear plan when to commit this, but I'm considering doing so >> sometime next week, unless someone objec

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-16 Thread Jakub Wartak
On Fri, Sep 13, 2024 at 1:45 AM Tomas Vondra wrote: > [..] > Anyway, at this point I'm quite happy with this improvement. I didn't > have any clear plan when to commit this, but I'm considering doing so > sometime next week, unless someone objects or asks for some additional > benchmarks etc. T

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-12 Thread Tomas Vondra
Turns out there was a bug in EXEC_BACKEND mode, causing failures on the Windows machine in CI. AFAIK the reason is pretty simple - the backends don't see the number of fast-path groups postmaster calculated from max_locks_per_transaction. Fixed that by calculating it again in AttachSharedMemoryStr

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-12 Thread Tomas Vondra
Hi, I've spent quite a bit of time trying to identify cases where having more fast-path lock slots could be harmful, without any luck. I started with the EPYC machine I used for the earlier tests, but found nothing, except for a couple cases unrelated to this patch, because it affects even cases w

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-06 Thread Jakub Wartak
On Thu, Sep 5, 2024 at 7:33 PM Tomas Vondra wrote: >>> My $0.02 cents: the originating case that triggered those patches, >>> actually started with LWLock/lock_manager waits being the top#1. The >>> operator can cross check (join) that with a group by pg_locks.fastpath >>> (='f'), count(*). So, I

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-05 Thread Tomas Vondra
On 9/4/24 13:15, Tomas Vondra wrote: > On 9/4/24 11:29, Jakub Wartak wrote: >> Hi Tomas! >> >> ... >> >> My $0.02 cents: the originating case that triggered those patches, >> actually started with LWLock/lock_manager waits being the top#1. The >> operator can cross check (join) that with a group by

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-05 Thread Tomas Vondra
Hi, Here's a bit more polished version of this patch series. I only propose 0001 and 0002 for eventual commit, the two other bits are just stuff to help with benchmarking etc. 0001 increases the size of the arrays, but uses hard-coded number of groups (64, so 1024 locks) and leaves everythin

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-05 Thread Robert Haas
On Tue, Sep 3, 2024 at 12:19 PM Tomas Vondra wrote: > > Doing some worst case math, suppose somebody has max_connections=1000 > > (which is near the upper limit of what I'd consider a sane setting) > > and max_locks_per_transaction=1 (ditto). The product is 10 > > million, so every 10 bytes of

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-04 Thread Tomas Vondra
On 9/4/24 17:12, David Rowley wrote: > On Wed, 4 Sept 2024 at 03:06, Robert Haas wrote: >> >> On Mon, Sep 2, 2024 at 1:46 PM Tomas Vondra wrote: >>> But say we add a GUC and set it to -1 by default, in which case it just >>> inherits the max_locks_per_transaction value. And then also provide

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-04 Thread Tomas Vondra
On 9/4/24 16:25, Matthias van de Meent wrote: > On Tue, 3 Sept 2024 at 18:20, Tomas Vondra wrote: >> FWIW the actual cost is somewhat higher, because we seem to need ~400B >> for every lock (not just the 150B for the LOCK struct). > > We do indeed allocate two PROCLOCKs for every LOCK, and alloca

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-04 Thread David Rowley
On Wed, 4 Sept 2024 at 03:06, Robert Haas wrote: > > On Mon, Sep 2, 2024 at 1:46 PM Tomas Vondra wrote: > > But say we add a GUC and set it to -1 by default, in which case it just > > inherits the max_locks_per_transaction value. And then also provide some > > basic metric about this fast-path ca

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-04 Thread Matthias van de Meent
On Tue, 3 Sept 2024 at 18:20, Tomas Vondra wrote: > FWIW the actual cost is somewhat higher, because we seem to need ~400B > for every lock (not just the 150B for the LOCK struct). We do indeed allocate two PROCLOCKs for every LOCK, and allocate those inside dynahash tables. That amounts to (152+

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-04 Thread Tomas Vondra
On 9/4/24 11:29, Jakub Wartak wrote: > Hi Tomas! > > On Tue, Sep 3, 2024 at 6:20 PM Tomas Vondra wrote: >> >> On 9/3/24 17:06, Robert Haas wrote: >>> On Mon, Sep 2, 2024 at 1:46 PM Tomas Vondra wrote: The one argument to not tie this to max_locks_per_transaction is the vastly different

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-04 Thread Jakub Wartak
Hi Tomas! On Tue, Sep 3, 2024 at 6:20 PM Tomas Vondra wrote: > > On 9/3/24 17:06, Robert Haas wrote: > > On Mon, Sep 2, 2024 at 1:46 PM Tomas Vondra wrote: > >> The one argument to not tie this to max_locks_per_transaction is the > >> vastly different "per element" memory requirements. If you ad

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-03 Thread Tomas Vondra
On 9/3/24 17:06, Robert Haas wrote: > On Mon, Sep 2, 2024 at 1:46 PM Tomas Vondra wrote: >> The one argument to not tie this to max_locks_per_transaction is the >> vastly different "per element" memory requirements. If you add one entry >> to max_locks_per_transaction, that adds LOCK which is a wh

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-03 Thread Robert Haas
On Mon, Sep 2, 2024 at 1:46 PM Tomas Vondra wrote: > The one argument to not tie this to max_locks_per_transaction is the > vastly different "per element" memory requirements. If you add one entry > to max_locks_per_transaction, that adds LOCK which is a whopping 152B. > OTOH one fast-path entry i

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-02 Thread Tomas Vondra
On 9/2/24 01:53, Robert Haas wrote: > On Sun, Sep 1, 2024 at 3:30 PM Tomas Vondra wrote: >> I don't think that's possible with hard-coded size of the array - that >> allocates the memory for everyone. We'd need to make it variable-length, >> and while doing those benchmarks I think we actually alr

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-01 Thread Robert Haas
On Sun, Sep 1, 2024 at 3:30 PM Tomas Vondra wrote: > I don't think that's possible with hard-coded size of the array - that > allocates the memory for everyone. We'd need to make it variable-length, > and while doing those benchmarks I think we actually already have a GUC > for that - max_locks_pe

Re: scalability bottlenecks with (many) partitions (and more)

2024-09-01 Thread Tomas Vondra
Hi, While discussing this patch with Robert off-list, one of the questions he asked was is there's some size threshold after which it starts to have negative impact. I didn't have a good answer to that - I did have some intuition (that making it too large would not hurt), but I haven't done any te

Re: scalability bottlenecks with (many) partitions (and more)

2024-08-05 Thread Tomas Vondra
Hi, On 6/25/24 12:04, Tomas Vondra wrote: On 6/24/24 17:05, Robert Haas wrote: On Sun, Jan 28, 2024 at 4:57 PM Tomas Vondra wrote: For NUM_LOCK_PARTITIONS this is pretty simple (see 0001 patch). The LWLock table has 16 partitions by default - it's quite possible that on machine with many co

Re: scalability bottlenecks with (many) partitions (and more)

2024-06-25 Thread Robert Haas
On Tue, Jun 25, 2024 at 6:04 AM Tomas Vondra wrote: > Yeah, definitely needs comment explaining this. > > I admit those numbers are pretty arbitrary primes, to implement a > trivial hash function. That was good enough for a PoC patch, but maybe > for a "proper" version this should use a better has

Re: scalability bottlenecks with (many) partitions (and more)

2024-06-25 Thread Tomas Vondra
On 6/24/24 17:05, Robert Haas wrote: > On Sun, Jan 28, 2024 at 4:57 PM Tomas Vondra > wrote: >> For NUM_LOCK_PARTITIONS this is pretty simple (see 0001 patch). The >> LWLock table has 16 partitions by default - it's quite possible that on >> machine with many cores and/or many partitions, we ca

Re: scalability bottlenecks with (many) partitions (and more)

2024-06-24 Thread Robert Haas
On Sun, Jan 28, 2024 at 4:57 PM Tomas Vondra wrote: > For NUM_LOCK_PARTITIONS this is pretty simple (see 0001 patch). The > LWLock table has 16 partitions by default - it's quite possible that on > machine with many cores and/or many partitions, we can easily hit this. > So I bumped this 4x to 64

Re: scalability bottlenecks with (many) partitions (and more)

2024-01-31 Thread Tomas Vondra
On 1/29/24 16:42, Ronan Dunklau wrote: > Le lundi 29 janvier 2024, 15:59:04 CET Tomas Vondra a écrit : >> I'm not sure work_mem is a good parameter to drive this. It doesn't say >> how much memory we expect the backend to use - it's a per-operation >> limit, so it doesn't work particularly well w

Re: scalability bottlenecks with (many) partitions (and more)

2024-01-29 Thread Ronan Dunklau
Le lundi 29 janvier 2024, 15:59:04 CET Tomas Vondra a écrit : > I'm not sure work_mem is a good parameter to drive this. It doesn't say > how much memory we expect the backend to use - it's a per-operation > limit, so it doesn't work particularly well with partitioning (e.g. with > 100 partitions,

Re: scalability bottlenecks with (many) partitions (and more)

2024-01-29 Thread Tomas Vondra
On 1/29/24 15:15, Ronan Dunklau wrote: > Le lundi 29 janvier 2024, 13:17:07 CET Tomas Vondra a écrit : >>> Did you try running an strace on the process ? That may give you some >>> hindsights into what malloc is doing. A more sophisticated approach would >>> be using stap and plugging it into th

Re: scalability bottlenecks with (many) partitions (and more)

2024-01-29 Thread Ronan Dunklau
Le lundi 29 janvier 2024, 13:17:07 CET Tomas Vondra a écrit : > > Did you try running an strace on the process ? That may give you some > > hindsights into what malloc is doing. A more sophisticated approach would > > be using stap and plugging it into the malloc probes, for example > > memory_sbrk

Re: scalability bottlenecks with (many) partitions (and more)

2024-01-29 Thread Tomas Vondra
On 1/29/24 09:53, Ronan Dunklau wrote: > Le dimanche 28 janvier 2024, 22:57:02 CET Tomas Vondra a écrit : > > Hi Tomas ! > > I'll comment on glibc-malloc part as I studied that part last year, and > proposed some things here: https://www.postgresql.org/message-id/ > 3424675.QJadu78ljV%40aivenlap

Re: scalability bottlenecks with (many) partitions (and more)

2024-01-29 Thread Ronan Dunklau
Le dimanche 28 janvier 2024, 22:57:02 CET Tomas Vondra a écrit : Hi Tomas ! I'll comment on glibc-malloc part as I studied that part last year, and proposed some things here: https://www.postgresql.org/message-id/ 3424675.QJadu78ljV%40aivenlaptop > FWIW where does the malloc overhead come from