Hi! I'd like to propose a revised patch based on various ideas upthread.
This patch works as following. 1) B-tree meta page is extended with 2 additional parameters: * btm_oldest_btpo_xact – oldest btpo_xact among of deleted pages, * btm_last_cleanup_num_heap_tuples – number of heap tuples during last cleanup scan. 2) These parameters are reset during btbulkdelete() and set during btvacuumcleanup(). 3) Index scans during second and subsequent btvacuumcleanup() happen only if btm_oldest_btpo_xact is older than RecentGlobalXmin OR num_heap_tuples >= btm_last_cleanup_num_heap_tuples(1 + vacuum_cleanup_index_scale_factor). In other words btvacuumcleanup() scans the index only if there are recyclable pages, or index statistics is stalled (inserted more than vacuum_cleanup_index_scale_factor since last index statistics collection). 4) vacuum_cleanup_index_scale_factor can be set either by GUC or reloption. Default value is 0.1. So, by default cleanup scan is triggered after increasing of table size by 10%. 5) Since new fields are added to the metapage, BTREE_VERSION is bumped. In order to support pg_upgrade, read of previous metapage version is supported. On metapage rewrite, it's upgraded to the new version. So, since we don't skip scan of recyclable pages, there is no risk of xid wraparound. Risk of stalled statistics is also small, because vacuum_cleanup_index_scale_factor default value is quite low. User can increase vacuum_cleanup_index_scale_factor on his own risk and have less load of B-tree cleanup scan bought by more gap in index statistics. Some simple benchmark shows the effect. Before patch. # insert into t select i from generate_series(1,100000000) i; # create index t_i_idx on t(i); # vacuum t; VACUUM Time: 15639,822 ms (00:15,640) # insert into t select i from generate_series(1,1000) i; INSERT 0 1000 Time: 6,195 ms # vacuum t; VACUUM Time: 1012,794 ms (00:01,013) # insert into t select i from generate_series(1,1000) i; INSERT 0 1000 Time: 5,276 ms # vacuum t; VACUUM Time: 1013,254 ms (00:01,013) After patch. # insert into t select i from generate_series(1,100000000) i; # create index t_i_idx on t(i); # vacuum t; VACUUM Time: 15689,450 ms (00:15,689) # insert into t select i from generate_series(1,1000) i; INSERT 0 1000 Time: 5,585 ms # vacuum t; VACUUM Time: 50,777 ms # insert into t select i from generate_series(1,1000) i; INSERT 0 1000 Time: 5,641 ms # vacuum t; VACUUM Time: 46,997 ms Thus, vacuum time for append-only table drops from 1000 ms to 50 ms (in about 20X). ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
0001-lazy-btree-cleanup-3.patch
Description: Binary data