On Sun, Jun 12, 2011 at 12:15:29AM -0400, Robert Haas wrote: > On Sat, Jun 11, 2011 at 11:40 PM, Noah Misch <n...@leadboat.com> wrote: > > We currently achieve that wait-free by first marking the page with the next > > available xid and then reusing it when that mark (btpo.xact) predates the > > oldest running xid (RecentXmin). ?(At the moment, I'm failing to work out > > why > > this is OK with scans from transactions that haven't allocated an xid, but I > > vaguely recall convincing myself it was fine at one point.) ?It would indeed > > also be enough to call GetLockConflicts(locktag-of-index, > > AccessExclusiveLock) > > and check whether any of the returned transactions have PGPROC.xmin below > > the > > mark. ?That's notably more expensive than just comparing RecentXmin, so I'm > > not sure how well it would pay off overall. ?However, it could only help us > > on > > the master. ?(Not strictly true, but any way I see to extend it to the > > standby > > has critical flaws.) ?On the master, we can see a conflicting transaction > > and > > put off reusing the page. ?By the time the record hits the standby, we have > > to > > apply it, and we might have a running transaction that will hold a lock on > > the > > index for the next, say, 72 hours. ?At such times, vacuum_defer_cleanup_age > > or > > hot_standby_feedback ought to prevent the recovery stall. > > > > This did lead me to realize that what we do in this regard on the standby > > can > > be considerably independent from what we do on the master. ?If fruitful, the > > standby can prove the absence of a scan holding a right-link in a completely > > different fashion. ?So, we *could* take the cleanup-lock approach on the > > standby without changing very much on the master. > > Well, I'm generally in favor of trying to fix this problem without > changing what the master does. It's a weakness of our replication > technology that the standby has no better way to cope with a cleanup > operation on the master than to start killing queries, but then again > it's a weakness of our MVCC technology that we don't reuse space > quickly enough and end up with bloat. I hear a lot more complaints > about the second weakness than I do about the first.
I fully agree. That said, if this works on the standby, we may as well also use it opportunistically on the master, to throttle bloat. > At any rate, if taking a cleanup lock on the right-linked page on the > standby is sufficient to fix the problem, that seems like a far > superior solution in any case. Presumably the frequency of someone > having a pin on that particular page will be far lower than any > matching based on XID or heavyweight locks. And the vast majority of > such pins should disappear before the startup process feels obliged to > get out its big hammer. Yep; looks promising. Does such a thing have a chance of being backpatchable? I think the chances start slim and fall almost to zero on account of the difficulty of avoiding a WAL format change. Assuming that conclusion, I do think it's worth starting with something simple, even if it means additional bloat on the master in the wal_level=hot_standby + vacuum_defer_cleanup_age / hot_standby_feedback case. In choosing those settings, the administrator has taken constructive steps to accept master-side bloat in exchange for delaying recovery conflict. What's your opinion? Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers