On Mon, Oct 12, 2020 at 3:47 AM Andrey Borodin <x4...@yandex-team.ru> wrote: > The idea looks very interesting. > It resembles GiST microvacuum: GiST tries to vacuum single page before split.
AFAICT the GiST microvacuum mechanism is based on the one in nbtree, which is based on setting LP_DEAD bits when index scans find that the TIDs are dead-to-all. That's easy to justify, because it is easy and cheap to save future index scans the trouble of following the TIDs just to discover the same thing for themselves. The difference here is that we're simply making an intelligent guess -- there have been no index scans, but we're going to do a kind of special index scan at the last minute to see if we can avoid a page split. I think that this is okay because in practice we can do it in a reasonably targeted way. We will never do it with INSERTs, for example (except in unique indexes, though only when we know that there are duplicates because we saw them in _bt_check_unique()). In the worst case we make non-HOT updates a bit slower...and I'm not even sure that that's a bad thing when we don't manage to delete anything. After all, non-HOT updates impose a huge cost on the system. They have "negative externalities". Another way to look at it is that the mechanism I propose to add takes advantage of "convexity" [1]. (Actually, several of my indexing ideas are based on similar principles -- like the nbtsplitloc.c stuff.) Attached is v2. It controls the cost of visiting the heap by finding the heap page that has the most TIDs that we might be able to kill (among those that are duplicates on a leaf page). It also adds a hinting mechanism to the executor to avoid uselessly applying the optimization with INSERTs. > I'm curious how cost of page deduplication is compared to cost of page split? > Should we do deduplication of page will still remain 99% full? It depends on how you measure it, but in general I would say that the cost of traditional Postgres 13 deduplication is much lower. Especially as measured in WAL volume. I also believe that the same is true for this new delete deduplication mechanism. The way we determine which heap pages to visit maximizes the chances that we'll get lucky while minimizing the cost (currently we don't visit more than 2 heap pages unless we get at least one kill -- I think I could be more conservative here without losing much). A page split also has to exclusive lock two other pages (the right sibling page and parent page), so even the locking is perhaps better. The attached patch can completely or almost completely avoid index bloat in extreme cases with non-HOT updates. This can easily increase throughput by 2x or more, depending on how extreme you make it (i.e. how many indexes you have). It seems like the main problem caused by non-HOT updates is in fact page splits themselves. It is not so much a problem with dirtying of pages. You can test this with a benchmark like the one that was used for WARM back in 2017: https://www.postgresql.org/message-id/flat/CABOikdMNy6yowA%2BwTGK9RVd8iw%2BCzqHeQSGpW7Yka_4RSZ_LOQ%40mail.gmail.com I suspect that it's maybe even more effective than WARM was with this benchmark. [1] https://fooledbyrandomness.com/ConvexityScience.pdf -- Peter Geoghegan
v2-0001-Add-delete-deduplication-to-nbtree.patch
Description: Binary data