On 21 December 2015 at 21:28, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote:
> Simon Riggs wrote: > > During VACUUM of btrees, we need to pin all pages, even those where > tuples > > are not removed, which I am calling here the "pin scan". This is > especially > > a problem during redo, where removing one tuple from a 100GB btree can > take > > a minute or more. That causes replication lags. Bad Thing. > > Agreed on there being a problem here. > > As a reminder, this "pin scan" is implemented in the > HotStandbyActiveInReplay() block in btree_xlog_vacuum; this routine is > in charge of replaying an action recorded by _bt_delitems_vacuum. That > replay works by acquiring cleanup lock on all btree blocks from > lastBlockVacuumed to "this one" (i.e. the one in which elements are > being removed). In essence, this means pinning *all* the blocks in the > index, which is bad. > The new code sets lastBlockVacuumed to Invalid; a new test in the replay > code makes that value not pin anything. This is supposed to optimize > the case in question. > Nice summary, thanks. > One thing that this patch should update is the comment above struct > xl_btree_vacuum, which currently state that EnsureBlockUnpinned() is one > of the options to apply to each block, but that routine doesn't exist. > Updated > One possible naive optimization is to avoid locking pages that aren't > leaf pages, but that would still require fetching the block from disk, > so it wouldn't actually optimize anything other than the cleanup lock > acquisition. (And probably not too useful, since leaf pages are said to > be 95% - 99% of index pages anyway.) > Agreed, not worth it. > Also of interest is the comment above the call to _bt_delitems_vacuum in > btvacuumpage(); that needs an update too. > Updated > It appears to me that your patch changes the call in btvacuumscan() but > not the one in btvacuumpage(). I assume you should be changing both. > Yes, that was correct. Updated. > I think the new comment that talks about Toast Index should explain > *why* we can skip the pinning in all cases except that one, instead of > just saying we can do it. > Greatly expanded comments. Thanks for the review. New patch attached. -- Simon Riggs http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
avoid_standby_pin_scan.v2.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers