Hello, Simon. В Пт, 25/02/2022 в 04:35 +0000, Simon Riggs пишет: > On Mon, 21 Feb 2022 at 08:06, Yura Sokolov <y.soko...@postgrespro.ru> wrote: > > Good day, Kyotaro Horiguchi and hackers. > > > > В Чт, 17/02/2022 в 14:16 +0900, Kyotaro Horiguchi пишет: > > > At Wed, 16 Feb 2022 10:40:56 +0300, Yura Sokolov > > > <y.soko...@postgrespro.ru> wrote in > > > > Hello, all. > > > > > > > > I thought about patch simplification, and tested version > > > > without BufTable and dynahash api change at all. > > > > > > > > It performs suprisingly well. It is just a bit worse > > > > than v1 since there is more contention around dynahash's > > > > freelist, but most of improvement remains. > > > > > > > > I'll finish benchmarking and will attach graphs with > > > > next message. Patch is attached here. > > > > > > Thanks for the new patch. The patch as a whole looks fine to me. But > > > some comments needs to be revised. > > > > Thank you for review and remarks. > > v3 gets the buffer partition locking right, well done, great results! > > In v3, the comment at line 1279 still implies we take both locks > together, which is not now the case. > > Dynahash actions are still possible. You now have the BufTableDelete > before the BufTableInsert, which opens up the possibility I discussed > here: > http://postgr.es/m/CANbhV-F0H-8oB_A+m=55hP0e0QRL=rdddqusxmtft6jprdx...@mail.gmail.com > (Apologies for raising a similar topic, I hadn't noticed this thread > before; thanks to Horiguchi-san for pointing this out). > > v1 had a horrible API (sorry!) where you returned the entry and then > explicitly re-used it. I think we *should* make changes to dynahash, > but not with the API you proposed. > > Proposal for new BufTable API > BufTableReuse() - similar to BufTableDelete() but does NOT put entry > back on freelist, we remember it in a private single item cache in > dynahash > BufTableAssign() - similar to BufTableInsert() but can only be > executed directly after BufTableReuse(), fails with ERROR otherwise. > Takes the entry from single item cache and re-assigns it to new tag > > In dynahash we have two new modes that match the above > HASH_REUSE - used by BufTableReuse(), similar to HASH_REMOVE, but > places entry on the single item cache, avoiding freelist > HASH_ASSIGN - used by BufTableAssign(), similar to HASH_ENTER, but > uses the entry from the single item cache, rather than asking freelist > This last call can fail if someone else already inserted the tag, in > which case it adds the single item cache entry back onto freelist > > Notice that single item cache is not in shared memory, so on abort we > should give it back, so we probably need an extra API call for that > also to avoid leaking an entry.
Why there is need for this? Which way backend could be forced to abort between BufTableReuse and BufTableAssign in this code path? I don't see any CHECK_FOR_INTERRUPTS on the way, but may be I'm missing something. > > Doing it this way allows us to > * avoid touching freelists altogether in the common path - we know we > are about to reassign the entry, so we do remember it - no contention > from other backends, no borrowing etc.. > * avoid sharing the private details outside of the dynahash module > * allows us to use the same technique elsewhere that we have > partitioned hash tables > > This approach is cleaner than v1, but should also perform better > because there will be a 1:1 relationship between a buffer and its > dynahash entry, most of the time. Thank you for suggestion. Yes, it is much clearer than my initial proposal. Should I incorporate it to v4 patch? Perhaps, it could be a separate commit in new version. > > With these changes, I think we will be able to *reduce* the number of > freelists for partitioned dynahash from 32 to maybe 8, as originally > speculated by Robert in 2016: > > https://www.postgresql.org/message-id/CA%2BTgmoZkg-04rcNRURt%3DjAG0Cs5oPyB-qKxH4wqX09e-oXy-nw%40mail.gmail.com > since the freelists will be much less contended with the above approach > > It would be useful to see performance with a higher number of connections, > >400. > > -- > Simon Riggs http://www.EnterpriseDB.com/ ------ regards, Yura Sokolov