On Wed, Aug 25, 2021 at 10:58 AM Robert Haas <robertmh...@gmail.com> wrote: > Makes sense.
I'm glad that the big picture stuff makes sense to you. > I think one of the big implementation challenges here is > coping with the scenario where there's not enough shared memory > available ... or else somehow making that impossible without reserving > an unreasonable amount of shared memory. Yes, it'll definitely be necessary to nail that down. > If you allowed space for > every buffer to belong to a different relation and have the maximum > number of leases and whatever, you'd probably have no possibility of > OOM, but you'd probably be pre-reserving too much memory. I hope that we can control the shared memory space overhead by making it a function of max_connections, plus some configurable number of relations that get modified within a single transaction. This approach must behave in the same way when when the number of tables that each transaction actually modifies is high -- perhaps a transaction that does this then pays a penalty in WAL logging within the FSM. I think that that can be made manageable, especially if we can pretty much impose the cost directly on those transactions that need to modify lots of relations all at once. (If we can reuse the shared memory over time it'll help too.) > I also think > there are some implementation challenges around locking. That seems likely. > You probably > need some, because the data structure is shared, but because it's > complex, it's not easy to create locking that allows for good > concurrency. Or so I think. My hope is that this design more than makes up for it by relieving contention in other areas. Like buffer lock contention, or relation extension lock contention. > Andres has been working -- I think for years now -- on replacing the > buffer mapping table with a radix tree of some kind. That strikes me > as very similar to what you're doing here. The per-relation data can > then include not only the kind of stuff you're talking about but very > fundamental things like how long it is and where its buffers are in > the buffer pool. Hopefully we don't end up with dueling patches. I agree that there is definitely some overlap. I see no risk of a real conflict, though. I have mostly been approaching this project as an effort to fix the locality problems, mostly by looking for fixes to the BenchmarkSQL workload's problems. I have to admit that the big picture stuff about exploiting transactional semantics with free space management is still pretty aspirational. The resource management parts of my prototype patch are by far the kludgiest parts. I hope that I can benefit from whatever work Andres has already done on this, particularly when it comes to managing per-relation metadata in shared memory. -- Peter Geoghegan