On Fri, 2006-10-27 at 22:19 +0100, Simon Riggs wrote: > So we definitely have a nasty problem here. > > VACUUM FREEZE is just a loaded gun right now. > > > Maybe it's OK to say that during WAL replay we keep it > > all the way back to the freeze horizon, but I'm not sure how we keep the > > system from wiping clog it still needs right after switching to normal > > operation. Maybe we should somehow not xlog updates of datvacuumxid? > > Thinking...
Suggestions: 1. Create a new Utility rmgr that can issue XLOG_UTIL_FREEZE messages for each block that has had any tuples frozen on it during normal VACUUMs. We need log only the relid, blockid and vacuum's xid to redo the freeze operation. 2. VACUUM FREEZE need not generate any additional WAL records, but will do an immediate sync following execution and before clog truncation. That way the large number of changed blocks will all reach disk before we do the updates to the catalog. 3. We don't truncate the clog during WAL replay, so the clog will grow during recovery. Nothing to do there to make things safe. 4. When InArchiveRecovery we should set all of the datminxid and datvacuumxid fields to be the Xid from where recovery started, so that clog is not truncated soon after recovery. Performing a VACUUM FREEZE after a recovery would be mentioned as an optional task at the end of a PITR recovery on a failover/second server. 5. At 3.5 billion records during recovery we should halt the replay, do a full database scan to set hint bits, truncate clog, then restart replay. (Automatically within the recovery process). 6. During WAL replay, put out a warning message every 1 billion rows saying that a hint bit scan will eventually be required if recovery continues. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster