Recent work from commit 5100010e taught VACUUM that it doesn't have to do index vacuuming in cases where there are practically zero (not necessarily exactly zero) tuples to delete from indexes. It also surfaces the information used to decide whether or not we skip index vacuuming in the logs, via the log_autovacuum_min_duration mechanism. This log output can be used to get a sense of how effective HOT is over time.
There is one number of particular interest: the proportion of heap pages that have one or more LP_DEAD items across successive VACUUMs (this is expressed as a percentage of the table). The immediate reason to expose this is that it is crucial to the skipping behavior from commit 5100010e -- the threshold for skipping is 2% of all heap pages. But that's certainly not the only reason to pay attention to the percentage. It can also be used to understand HOT in general. It can be correlated with workload spikes and stressors that tend to make HOT less effective. A number of interesting workload-specific patterns seem to emerge by focussing on how this number changes/grows over time. I think that this should be pointed out directly in the docs. What's more, it seems like a good vehicle for discussing how HOT works in general. Why did we never really get around to documenting HOT? There should at least be some handling of how DBAs can get the most out of HOT through monitoring and through tuning -- especially by lowering heap fillfactor. It's very hard to get all UPDATEs to use HOT. It's much easier to get UPDATEs to mostly use HOT most of the time. How things change over time seems crucially important. I'll show one realistic example, just to give some idea of what it might look like. This is output for 3 successive autovacuums against the largest TPC-C table: automatic vacuum of table "postgres.public.bmsql_order_line": index scans: 0 pages: 0 removed, 4668405 remain, 0 skipped due to pins, 696997 skipped frozen tuples: 324571 removed, 186702373 remain, 333888 are dead but not yet removable, oldest xmin: 7624965 buffer usage: 3969937 hits, 3931997 misses, 1883255 dirtied index scan bypassed: 42634 pages from table (0.91% of total) have 324364 dead item identifiers avg read rate: 62.469 MB/s, avg write rate: 29.920 MB/s I/O Timings: read=42359.501 write=11867.903 system usage: CPU: user: 43.62 s, system: 38.17 s, elapsed: 491.74 s WAL usage: 4586766 records, 1850599 full page images, 8499388881 bytes automatic vacuum of table "postgres.public.bmsql_order_line": index scans: 0 pages: 0 removed, 5976391 remain, 0 skipped due to pins, 2516643 skipped frozen tuples: 759956 removed, 239787517 remain, 1848431 are dead but not yet removable, oldest xmin: 18489326 buffer usage: 3432019 hits, 3385757 misses, 2426571 dirtied index scan bypassed: 103941 pages from table (1.74% of total) have 790233 dead item identifiers avg read rate: 50.338 MB/s, avg write rate: 36.077 MB/s I/O Timings: read=49252.721 write=17003.987 system usage: CPU: user: 45.86 s, system: 34.47 s, elapsed: 525.47 s WAL usage: 5598040 records, 2274556 full page images, 10510281959 bytes automatic vacuum of table "postgres.public.bmsql_order_line": index scans: 1 pages: 0 removed, 7483804 remain, 1 skipped due to pins, 4208295 skipped frozen tuples: 972778 removed, 299295048 remain, 1970910 are dead but not yet removable, oldest xmin: 30987445 buffer usage: 3384994 hits, 4593727 misses, 2891003 dirtied index scan needed: 174243 pages from table (2.33% of total) had 1325752 dead item identifiers removed index "bmsql_order_line_pkey": pages: 1250660 in total, 0 newly deleted, 0 currently deleted, 0 reusable avg read rate: 60.505 MB/s, avg write rate: 38.078 MB/s I/O Timings: read=72881.986 write=21872.615 system usage: CPU: user: 65.24 s, system: 42.24 s, elapsed: 593.14 s WAL usage: 6668353 records, 2684040 full page images, 12374536939 bytes These autovacuums occur every 60-90 minutes with the workload in question (with pretty aggressive autovacuum settings). We see that HOT works rather well here -- but not so well that index vacuuming can be avoided consistently, which happens in the final autovacuum (it has "index scans: 1"). There was slow but steady growth in the percentage of LP_DEAD-containing heap pages over time here, which is common enough. The point of HOT is not to avoid having to do index vacuuming, of course -- that has it backwards. But framing HOT as doing work in backends so autovacuum doesn't have to do similar work later on is a good mental model to encourage users to adopt. There are also significant advantages to reducing the effectiveness of HOT to this one number -- HOT must be working well if it's close to 0%, almost always below 2%, with the occasional aberration that sees it go up to maybe 5%. But if it ever goes too high (in the absence of DELETEs), you might have trouble on your hands. It might not go down again. There are other interesting patterns from other tables within the same database -- including on tables with no UPDATEs, and tables that generally cannot use HOT due to a need to modify indexed columns. The particulars with these other tables hint at problems with heap fragmentation, which is something that users can think of as a degenerative process -- something that gets progressively worse in extreme cases (i.e. cases where it matters). This new percentage metric isn't about HOT per se. It's actually about the broader question of how effective the system is at keeping the physical location of each logical row stable over time, for a given workload. So maybe that's what any new documentation should address. The documentation would still have plenty to say about HOT, though. It would also have something to say about bottom-up index deletion, which can be thought of as avoiding problems when HOT doesn't or can't be applied very often. -- Peter Geoghegan