Using read_stream in index vacuum

Andrey M. Borodin Sat, 19 Oct 2024 02:39:26 -0700

Hi hackers!

On a recent hacking workshop [0] Thomas mentioned that patches using new API 
would be welcomed.
So I prototyped streamlining of B-tree vacuum for a discussion.
When cleaning an index we must visit every index tuple, thus we uphold a 
special invariant:
After checking a trailing block, it must be last according to subsequent 
RelationGetNumberOfBlocks(rel) call.


This invariant does not allow us to completely replace block loop with 
streamlining. That's why streamlining is done only for number of blocks 
returned by first RelationGetNumberOfBlocks(rel) call. A tail is processed with 
regular ReadBufferExtended().

Also, it's worth mentioning that we have to jump to the left blocks from a 
recently split pages. We also do it with regular ReadBufferExtended(). That's 
why signature btvacuumpage() now accepts a buffer, not a block number.


I've benchmarked the patch on my laptop (MacBook Air M3) with following 
workload:
1. Initialization
create unlogged table x as select random() r from generate_series(1,1e7);
create index on x(r);
create index on x(r);
create index on x(r);
create index on x(r);
create index on x(r);
create index on x(r);
create index on x(r);
vacuum;
2. pgbench with 1 client
insert into x select random() from generate_series(0,10) x;
vacuum x;

On my laptop I see ~3% increase in TPS of the the pgbench (~ from 101 to 104), 
but statistical noise is very significant, bigger than performance change. 
Perhaps, a less noisy benchmark can be devised.

What do you think? If this approach seems worthwhile, I can adapt same 
technology to other AMs.


Best regards, Andrey Borodin.

[0] 
https://rhaas.blogspot.com/2024/08/postgresql-hacking-workshop-september.html

0001-Prototype-B-tree-vacuum-streamlineing.patch
Description: Binary data

Using read_stream in index vacuum

Reply via email to