On Sat, Jul 6, 2019 at 4:08 PM Peter Geoghegan <p...@bowt.ie> wrote: > I took a closer look at this patch, and have some general thoughts on > its design, and specific feedback on the implementation.
I have some high level concerns about how the patch might increase contention, which could make queries slower. Apparently that is a real problem in other systems that use MVCC when their bitmap index feature is used -- they are never really supposed to be used with OLTP apps. This patch makes nbtree behave rather a lot like a bitmap index. That's not exactly true, because you're not storing a bitmap or compressing the TID lists, but they're definitely quite similar. It's easy to imagine a hybrid approach, that starts with a B-Tree with deduplication/TID lists, and eventually becomes a bitmap index as more duplicates are added [1]. It doesn't seem like it would be practical for these other MVCC database systems to have standard B-Tree secondary indexes that compress duplicates gracefully in the way that you propose to with this patch, because lock contention would presumably be a big problem for the same reason as it is with their bitmap indexes (whatever the true reason actually is). Is it really possible to have something that's adaptive, offering the best of both worlds? Having dug into it some more, I think that the answer for us might actually be "yes, we can have it both ways". Other database systems that are also based on MVCC still probably use a limited form of index locking, even in READ COMMITTED mode, though this isn't very widely known. They need this for unique indexes, but they also need it for transaction rollback, to remove old entries from the index when the transaction must abort. The section "6.7 Standard Practice" from the paper "Architecture of a Database System" [2] goes into this, saying: "All production databases today support ACID transactions. As a rule, they use write-ahead logging for durability, and two-phase locking for concurrency control. An exception is PostgreSQL, which uses multiversion concurrency control throughout." I suggest reading "6.7 Standard Practice" in full. Anyway, I think that *hundreds* or even *thousands* of rows are effectively locked all at once when a bitmap index needs to be updated in these other systems -- and I mean a heavyweight lock that lasts until the xact commits or aborts, like a Postgres row lock. As I said, this is necessary simply because the transaction might need to roll back. Of course, your patch never needs to do anything like that -- the only risk is that buffer lock contention will be increased. Maybe VACUUM isn't so bad after all! Doing deduplication adaptively and automatically in nbtree seems like it might play to the strengths of Postgres, while also ameliorating its weaknesses. As the same paper goes on to say, it's actually quite unusual that PostgreSQL has *transactional* full text search built in (using GIN), and offers transactional, high concurrency spatial indexing (using GiST). Actually, this is an additional advantages of our "pure" approach to MVCC -- we can add new high concurrency, transactional access methods relatively easily. [1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.3159&rep=rep1&type=pdf [2] http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf -- Peter Geoghegan