On Tue, Jun 7, 2022 at 5:00 PM Chinmay Kanchi <cgkan...@gmail.com> wrote: > I personally don't think this is a great replacement for a BTree index - for > one thing, it isn't really possible to use this approach beyond equality > comparisons (for scalars) or "contains"-type operations for arrays (or > tsvectors, jsonb, etc).
Why not? A bitmap is just a way of representing TIDs, that is often very space efficient once compression is applied. In principle a bitmap index can do anything that a B-Tree index can do, at least for SELECTs. Bitmap indexes are considered totally distinct to B-Tree indexes in some DB systems because the concurrency control characteristics (the use of heavyweight locks to protect the logical contents of the database) are very different . I think that this is because the index structure itself is so dense that the only practical approach that's compatible with 2PL style concurrency control (or versions to MVCC based on 2PL) is to lock a large number of TIDs at the same time. This can lead to deadlocks with even light concurrent modifications -- which would never happen with an equivalent B-Tree index. But the data structure is nevertheless more similar than different. I probably wouldn't want to have a technique like roaring bitmap compression of TIDs get applied by default within Postgres B-Trees, but the reasons for that are pretty subtle. I might still advocate *optional* TID list compression in Postgres B-Trees, which might even be something we'd end up calling a bitmap index, that would only be recommended for use in data warehousing scenarios. Extreme TID list compression isn't free -- it really isn't desirable when there are many concurrent modifications to relatively few index pages, as is common in OLTP applications. That's one important reason why bitmap indexes are generally only used in data warehousing environments, where the downside doesn't really matter, but the upside pays for itself (usually a fact table will have several bitmap indexes that are usually combined, not just one). > We ultimately abandoned that project because of the difficulty of keeping the > bitmaps in sync with changing data, which would no longer be an issue, if > this was built as an index. I think that it would in fact still be an issue if this was built as an index. There is a reason why the concurrency characteristics of bitmap indexes make them unsuitable for OLTP apps, which seems related. That wouldn't mean that it wouldn't still be worth it, but it would definitely be a real downside with some workloads. B-Tree deduplication is designed to have very little overhead with mixed reads and writes, so it's a performance all-rounder that can still be beaten by specialized techniques that come with their own downsides. -- Peter Geoghegan