On Tue, Nov 12, 2019 at 3:21 PM Peter Geoghegan <p...@bowt.ie> wrote: > * Decided to go back to turning deduplication on by default with > non-unique indexes, and off by default using unique indexes. > > The unique index stuff was regressed enough with INSERT-heavy > workloads that I was put off, despite my initial enthusiasm for > enabling deduplication everywhere.
I have changed my mind about this again. I now think that it would make sense to treat deduplication within unique indexes as a separate feature that cannot be disabled by the GUC at all (though we'd probably still respect the storage parameter for debugging purposes). I have found that fixing the WAL record size issue has helped remove what looked like a performance penalty for deduplication (but was actually just a general regression). Also, I have found a way of selectively applying deduplication within unique indexes that seems to have no downside, and considerable upside. The new criteria/heuristic for unique indexes is very simple: If a unique index has an existing item that is a duplicate on the incoming item at the point that we might have to split the page, then apply deduplication. Otherwise (when the incoming item has no duplicates), don't apply deduplication at all -- just accept that we'll have to split the page. We already cache the bounds of our initial binary search in insert state, so we can reuse that information within _bt_findinsertloc() when considering deduplication in unique indexes. This heuristic makes sense because deduplication within unique indexes should only target leaf pages that cannot possibly receive new values. In many cases, the only reason why almost all primary key leaf pages can ever split is because of non-HOT updates whose new HOT chain needs a new, equal entry in the primary key. This is the case with your standard identity column/serial primary key, for example (only the rightmost page will have a page split due to the insertion of new logical rows -- everything other variety of page split must be due to new physical tuples/versions). I imagine that it is possible for a leaf page to be a "mixture" of these two basic/general tendencies, but not for long. It really doesn't matter if we occasionally fail to delay a page split where that was possible, nor does it matter if we occasionally apply deduplication when that won't delay a split for very long -- pretty soon the page will split anyway. A split ought to separate the parts of the keyspace that exhibit each tendency. In general, we're only interested in delaying page splits in unique indexes *indefinitely*, since in effect that will prevent them *entirely*. (So the goal is *significantly* different to our general goal for deduplication -- it's about buying time for VACUUM to run or whatever, rather than buying space.) This heuristic helps the TPC-C "old order" tables PK from bloating quite noticeably, since that was the only unique index that is really affected by non-HOT UPDATEs (i.e. the UPDATE queries that touch that table happen to not be HOT-safe in general, which is not the case for any other table). It doesn't regress anything else from TPC-C, since there really isn't a benefit for other tables. More importantly, the working/draft version of the patch will often avoid a huge amount of bloat in a pgbench-style workload that has an extra index on the pgbench_accounts table, to prevent HOT updates. The accounts primary key (pgbench_accounts_pkey) hardly grows at all with the patch, but grows 2x on master. This 2x space saving seems to occur reliably, unless there is a lot of contention on individual *pages*, in which case the bloat can be delayed but not prevented. We get that 2x space saving with either uniformly distributed random updates on pgbench_accounts (i.e. the pgbench default), or with a skewed distribution that hashes the PRNG's value. Hashing like this simulates a workload where there the skew isn't concentrated in one part of the key space (i.e. there is skew, but very popular values are scattered throughout the index evenly, rather than being concentrated together in just a few leaf pages). Can anyone think of an adversarial case, that we may not do so well on with the new "only deduplicate within unique indexes when new item already has a duplicate" strategy? I'm having difficulty identifying some kind of worst case. -- Peter Geoghegan