Re: [HACKERS] On-the-fly index tuple deletion vs. hot_standby

Noah Misch Sun, 12 Jun 2011 12:02:53 -0700

On Sun, Jun 12, 2011 at 12:15:29AM -0400, Robert Haas wrote:
> On Sat, Jun 11, 2011 at 11:40 PM, Noah Misch <n...@leadboat.com> wrote:
> > We currently achieve that wait-free by first marking the page with the next
> > available xid and then reusing it when that mark (btpo.xact) predates the
> > oldest running xid (RecentXmin). ?(At the moment, I'm failing to work out 
> > why
> > this is OK with scans from transactions that haven't allocated an xid, but I
> > vaguely recall convincing myself it was fine at one point.) ?It would indeed
> > also be enough to call GetLockConflicts(locktag-of-index, 
> > AccessExclusiveLock)
> > and check whether any of the returned transactions have PGPROC.xmin below 
> > the
> > mark. ?That's notably more expensive than just comparing RecentXmin, so I'm
> > not sure how well it would pay off overall. ?However, it could only help us 
> > on
> > the master. ?(Not strictly true, but any way I see to extend it to the 
> > standby
> > has critical flaws.) ?On the master, we can see a conflicting transaction 
> > and
> > put off reusing the page. ?By the time the record hits the standby, we have 
> > to
> > apply it, and we might have a running transaction that will hold a lock on 
> > the
> > index for the next, say, 72 hours. ?At such times, vacuum_defer_cleanup_age 
> > or
> > hot_standby_feedback ought to prevent the recovery stall.
> >
> > This did lead me to realize that what we do in this regard on the standby 
> > can
> > be considerably independent from what we do on the master. ?If fruitful, the
> > standby can prove the absence of a scan holding a right-link in a completely
> > different fashion. ?So, we *could* take the cleanup-lock approach on the
> > standby without changing very much on the master.
> 
> Well, I'm generally in favor of trying to fix this problem without
> changing what the master does.  It's a weakness of our replication
> technology that the standby has no better way to cope with a cleanup
> operation on the master than to start killing queries, but then again
> it's a weakness of our MVCC technology that we don't reuse space
> quickly enough and end up with bloat.  I hear a lot more complaints
> about the second weakness than I do about the first.


I fully agree.  That said, if this works on the standby, we may as well also use
it opportunistically on the master, to throttle bloat.

> At any rate, if taking a cleanup lock on the right-linked page on the
> standby is sufficient to fix the problem, that seems like a far
> superior solution in any case.  Presumably the frequency of someone
> having a pin on that particular page will be far lower than any
> matching based on XID or heavyweight locks.  And the vast majority of
> such pins should disappear before the startup process feels obliged to
> get out its big hammer.

Yep; looks promising.

Does such a thing have a chance of being backpatchable?  I think the chances
start slim and fall almost to zero on account of the difficulty of avoiding a
WAL format change.  Assuming that conclusion, I do think it's worth starting
with something simple, even if it means additional bloat on the master in the
wal_level=hot_standby + vacuum_defer_cleanup_age / hot_standby_feedback case.
In choosing those settings, the administrator has taken constructive steps to
accept master-side bloat in exchange for delaying recovery conflict.  What's
your opinion?

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On-the-fly index tuple deletion vs. hot_standby

Reply via email to