Hi, On 2022-10-29 18:33:53 +0530, Bharath Rupireddy wrote: > On Sat, Oct 29, 2022 at 8:24 AM Andres Freund <and...@anarazel.de> wrote: > > > > Hi, > > > > I'm working to extract independently useful bits from my AIO work, to reduce > > the size of that patchset. This is one of those pieces. > > Thanks a lot for this great work. There are 12 patches in this thread, > I believe each of these patches is trying to solve separate problems > and can be reviewed and get committed separately, am I correct?
Mostly, yes. For 0001 I already started https://www.postgresql.org/message-id/20221027165914.2hofzp4cvutj6gin%40awork3.anarazel.de to discuss the specific issue. We don't strictly need v1-0002-aio-Add-some-error-checking-around-pinning.patch but I did find it useful. v1-0012-bufmgr-debug-Add-PrintBuffer-Desc.patch is not used in the patch series, but I found it quite useful when debugging issues with the patch. A heck of a lot easier to interpret page flags when they can be printed. I also think there's some architectural questions that'll influence the number of patches. E.g. I'm not convinced v1-0010-heapam-Add-num_pages-to-RelationGetBufferForTuple.patch is quite the right spot to track which additional pages should be used. It could very well instead be alongside ->smgr_targblock. Possibly the best path would instead be to return the additional pages explicitly to callers of RelationGetBufferForTuple, but RelationGetBufferForTuple does a bunch of work around pinning that potentially would need to be repeated in heap_multi_insert(). > > In workloads that extend relations a lot, we end up being extremely > > contended > > on the relation extension lock. We've attempted to address that to some > > degree > > by using batching, which helps, but only so much. > > Yes, I too have observed this in the past for parallel inserts in CTAS > work - > https://www.postgresql.org/message-id/CALj2ACW9BUoFqWkmTSeHjFD-W7_00s3orqSvtvUk%2BKD2H7ZmRg%40mail.gmail.com. > Tackling bulk relation extension problems will unblock the parallel > inserts (in CTAS, COPY) work I believe. Yea. There's a lot of places the current approach ended up being a bottleneck. > Firstly, 0001 avoids extra loop over waiters and looks a reasonable > change, some comments on the patch: > 1) > + int lwWaiting; /* 0 if not waiting, 1 if on > waitlist, 2 if > + * waiting to be woken */ > Use macros instead of hard-coded values for better readability? > > #define PROC_LW_LOCK_NOT_WAITING 0 > #define PROC_LW_LOCK_ON_WAITLIST 1 > #define PROC_LW_LOCK_WAITING_TO_BE_WOKEN 2 Yea - this was really more of a prototype patch - I noted that we'd want to use defines for this in https://www.postgresql.org/message-id/20221027165914.2hofzp4cvutj6gin%40awork3.anarazel.de > 3) > + proclist_delete(&lock->waiters, MyProc->pgprocno, lwWaitLink); > + found = true; > I guess 'found' is a bit meaningless here as we are doing away with > the proclist_foreach_modify loop. We can directly use > MyProc->lwWaiting == 1 and remove 'found'. We can rename it, but I think we still do need it, it's easier to analyze the logic if the relevant check happens on a value from while we held the wait list lock. Probably should do the reset inside the locked section as well. > 4) > if (!MyProc->lwWaiting) > if (!proc->lwWaiting) > Can we modify the above conditions in lwlock.c to MyProc->lwWaiting != > 1 or PROC_LW_LOCK_ON_WAITLIST or the macro? I think it's better to check it's 0, rather than just != 1. > 5) Is there any specific test case that I can see benefit of this > patch? If yes, can you please share it here? Yep, see the other thread, there's a pretty easy case there. You can also see it at extreme client counts with a pgbench -S against a cluster with a smaller shared_buffers. But the difference is not huge before something like 2048-4096 clients, and then it only occurs occasionally (because you need to end up with most connections waiting for one of the partitions). So the test case from the other thread is a lot better. Greetings, Andres Freund