On Mon, Apr 13, 2020 at 2:58 PM Thomas Munro <thomas.mu...@gmail.com> wrote:
> On Fri, Apr 3, 2020 at 2:22 PM Peter Geoghegan <p...@bowt.ie> wrote:
> > I think that it's worth considering whether or not there are a
> > significant number of "snapshot too old" users that rarely or never
> > rely on old snapshots used by new queries. Kevin said that this
> > happens "in some cases", but how many cases? Might it be that many
> > "snapshot too old" users could get by with a version of the feature
> > that makes the most conservative possible assumptions, totally giving
> > up on the idea of differentiating which blocks are truly safe to
> > access with an "old" snapshot? (In other words, one that assumes that
> > they're *all* unsafe for an "old" snapshot.)
> >
> > I'm thinking of a version of "snapshot too old" that amounts to a
> > statement timeout that gets applied for xmin horizon type purposes in
> > the conventional way, while only showing an error to the client if and
> > when they access literally any buffer (though not when the relation is
> > a system catalog). Is it possible that something along those lines is
> > appreciably better than nothing to users? If it is, and if we can find
> > a way to manage the transition, then maybe we could tolerate
> > supporting this greatly simplified implementation of "snapshot too
> > old".
>
> Interesting idea.  I'm keen to try prototyping it to see how well it
> works out it practice.  Let me know soon if you already have designs
> on that and I'll get out of your way, otherwise I'll give it a try and
> share what I come up with.

Here's a quick and dirty test patch of that idea (or my understanding
of it), just for experiments.  It introduces snapshot->expire_time and
a new timer SNAPSHOT_TIMEOUT to cause the next CHECK_FOR_INTERRUPTS()
to set snapshot->too_old on any active or registered snapshots whose
time has come, and then try to advance MyPgXact->xmin, without
considering the ones marked too old.  That gets rid of the concept of
"early pruning".  You can use just regular pruning, because the
snapshot is no longer holding the regular xmin back.  Then
TestForOldSnapshot() becomes simply if (snapshot->too_old)
ereport(...).

There are certainly some rough edges, missed details and bugs in here,
not least the fact (pointed out to me by Andres in an off-list chat)
that we sometimes use short-lived snapshots without registering them;
we'd have to fix that.  It also does nothing to ensure that
TestForOldSnapshot() is actually called at all the right places, which
is still required for correct results.

If those problems can be fixed, you'd have a situation where
snapshot-too-old is a coarse grained, blunt instrument that
effectively aborts your transaction even if the whole cluster is
read-only.  I am not sure if that's really truly useful to anyone (ie
if these ODBC cursor users would be satisfied; I'm not sure I
understand that use case).

Hmm.  I suppose it must be possible to put the LSN check back: if
(snapshot->too_old && PageGetLSN(page) > snapshot->lsn) ereport(...).
Then the granularity would be the same as today -- block level -- but
the complexity is transferred from the pruning side (has to deal with
xid time map) to the snapshot-owning side (has to deal with timers,
CFI() and make sure all snapshots are registered).  Maybe not a great
deal, and maybe not easier than fixing the existing bugs.

One problem is all the new setitimer() syscalls.  I feel like that
could be improved, as could statement_timeout, by letting existing
timers run rather than repeatedly rescheduling eagerly, so that eg a 1
minute timeout never gets rescheduled more than once per minute.  I
haven't looked into that, but I guess it's no worse than the existing
implement's overheads anyway.

PS in the patch the GUC is interpreted as milliseconds, which is more
fun for testing but it should really be minutes like before.
From 8a5455c7e376bd6ceddf956f789cfdede0396f3f Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Wed, 15 Apr 2020 12:35:51 +1200
Subject: [PATCH] Implement snapshot_too_old using a timer.

Remove the previous implementation of "snapshot too old", and replace it
with a very conservative implementation based only on time passing.

Use a timer to mark active and registered snapshots as too old and
potentially advance the backend's xmin.  Snapshots that reach this state
can no longer be used to read buffers, because data they need to see may
be vacuumed away.

XXX Incomplete experimental code!
---
 src/backend/access/common/toast_internals.c |  12 +-
 src/backend/access/heap/pruneheap.c         |   4 +-
 src/backend/catalog/index.c                 |  15 +-
 src/backend/commands/vacuum.c               |   3 +-
 src/backend/storage/buffer/bufmgr.c         |  17 -
 src/backend/storage/ipc/ipci.c              |   2 -
 src/backend/storage/ipc/procarray.c         |  23 +-
 src/backend/tcop/postgres.c                 |   6 +
 src/backend/utils/init/globals.c            |   1 +
 src/backend/utils/init/postinit.c           |  10 +
 src/backend/utils/time/snapmgr.c            | 610 ++++----------------
 src/include/miscadmin.h                     |   1 +
 src/include/storage/bufmgr.h                |  16 +-
 src/include/utils/snapmgr.h                 |  47 +-
 src/include/utils/snapshot.h                |   4 +-
 src/include/utils/timeout.h                 |   1 +
 16 files changed, 170 insertions(+), 602 deletions(-)

diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 25a81e5ec6..66189df0cb 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -615,18 +615,10 @@ toast_close_indexes(Relation *toastidxs, int num_indexes, LOCKMODE lock)
 /* ----------
  * init_toast_snapshot
  *
- *	Initialize an appropriate TOAST snapshot.  We must use an MVCC snapshot
- *	to initialize the TOAST snapshot; since we don't know which one to use,
- *	just use the oldest one.  This is safe: at worst, we will get a "snapshot
- *	too old" error that might have been avoided otherwise.
+ *	Initialize an appropriate TOAST snapshot.
  */
 void
 init_toast_snapshot(Snapshot toast_snapshot)
 {
-	Snapshot	snapshot = GetOldestSnapshot();
-
-	if (snapshot == NULL)
-		elog(ERROR, "no known snapshots");
-
-	InitToastSnapshot(*toast_snapshot, snapshot->lsn, snapshot->whenTaken);
+	InitToastSnapshot(*toast_snapshot);
 }
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1794cfd8d9..c2d1ddb092 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -102,9 +102,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		RelationIsAccessibleInLogicalDecoding(relation))
 		OldestXmin = RecentGlobalXmin;
 	else
-		OldestXmin =
-			TransactionIdLimitedForOldSnapshots(RecentGlobalDataXmin,
-												relation);
+		OldestXmin = RecentGlobalDataXmin;
 
 	Assert(TransactionIdIsValid(OldestXmin));
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index bd7ec923e9..da40f4dbcf 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2919,8 +2919,7 @@ index_build(Relation heapRelation,
 	/*
 	 * If we found any potentially broken HOT chains, mark the index as not
 	 * being usable until the current transaction is below the event horizon.
-	 * See src/backend/access/heap/README.HOT for discussion.  Also set this
-	 * if early pruning/vacuuming is enabled for the heap relation.  While it
+	 * See src/backend/access/heap/README.HOT for discussion.  While it
 	 * might become safe to use the index earlier based on actual cleanup
 	 * activity and other active transactions, the test for that would be much
 	 * more complex and would require some form of blocking, so keep it simple
@@ -2936,7 +2935,7 @@ index_build(Relation heapRelation,
 	 *
 	 * We also need not set indcheckxmin during a concurrent index build,
 	 * because we won't set indisvalid true until all transactions that care
-	 * about the broken HOT chains or early pruning/vacuuming are gone.
+	 * about the broken HOT chains are gone.
 	 *
 	 * Therefore, this code path can only be taken during non-concurrent
 	 * CREATE INDEX.  Thus the fact that heap_update will set the pg_index
@@ -2945,7 +2944,7 @@ index_build(Relation heapRelation,
 	 * about any concurrent readers of the tuple; no other transaction can see
 	 * it yet.
 	 */
-	if ((indexInfo->ii_BrokenHotChain || EarlyPruningEnabled(heapRelation)) &&
+	if ((indexInfo->ii_BrokenHotChain) &&
 		!isreindex &&
 		!indexInfo->ii_Concurrent)
 	{
@@ -3584,7 +3583,6 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
 		HeapTuple	indexTuple;
 		Form_pg_index indexForm;
 		bool		index_bad;
-		bool		early_pruning_enabled = EarlyPruningEnabled(heapRelation);
 
 		pg_index = table_open(IndexRelationId, RowExclusiveLock);
 
@@ -3598,12 +3596,11 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
 					 !indexForm->indisready ||
 					 !indexForm->indislive);
 		if (index_bad ||
-			(indexForm->indcheckxmin && !indexInfo->ii_BrokenHotChain) ||
-			early_pruning_enabled)
+			(indexForm->indcheckxmin && !indexInfo->ii_BrokenHotChain))
 		{
-			if (!indexInfo->ii_BrokenHotChain && !early_pruning_enabled)
+			if (!indexInfo->ii_BrokenHotChain)
 				indexForm->indcheckxmin = false;
-			else if (index_bad || early_pruning_enabled)
+			else if (index_bad)
 				indexForm->indcheckxmin = true;
 			indexForm->indisvalid = true;
 			indexForm->indisready = true;
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 3a89f8fe1e..8b3db22ca5 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -957,8 +957,7 @@ vacuum_set_xid_limits(Relation rel,
 	 * working on a particular table at any time, and that each vacuum is
 	 * always an independent transaction.
 	 */
-	*oldestXmin =
-		TransactionIdLimitedForOldSnapshots(GetOldestXmin(rel, PROCARRAY_FLAGS_VACUUM), rel);
+	*oldestXmin = GetOldestXmin(rel, true);
 
 	Assert(TransactionIdIsNormal(*oldestXmin));
 
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index f9980cf80c..5f017a19fd 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4538,20 +4538,3 @@ IssuePendingWritebacks(WritebackContext *context)
 
 	context->nr_pending = 0;
 }
-
-
-/*
- * Implement slower/larger portions of TestForOldSnapshot
- *
- * Smaller/faster portions are put inline, but the entire set of logic is too
- * big for that.
- */
-void
-TestForOldSnapshot_impl(Snapshot snapshot, Relation relation)
-{
-	if (RelationAllowsEarlyPruning(relation)
-		&& (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp())
-		ereport(ERROR,
-				(errcode(ERRCODE_SNAPSHOT_TOO_OLD),
-				 errmsg("snapshot too old")));
-}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cd..eb18ae36cb 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -143,7 +143,6 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, WalSndShmemSize());
 		size = add_size(size, WalRcvShmemSize());
 		size = add_size(size, ApplyLauncherShmemSize());
-		size = add_size(size, SnapMgrShmemSize());
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
@@ -259,7 +258,6 @@ CreateSharedMemoryAndSemaphores(void)
 	/*
 	 * Set up other modules that need some shared memory space
 	 */
-	SnapMgrInit();
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 363000670b..3d6cfce5bf 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1758,27 +1758,8 @@ GetSnapshotData(Snapshot snapshot)
 	snapshot->active_count = 0;
 	snapshot->regd_count = 0;
 	snapshot->copied = false;
-
-	if (old_snapshot_threshold < 0)
-	{
-		/*
-		 * If not using "snapshot too old" feature, fill related fields with
-		 * dummy values that don't require any locking.
-		 */
-		snapshot->lsn = InvalidXLogRecPtr;
-		snapshot->whenTaken = 0;
-	}
-	else
-	{
-		/*
-		 * Capture the current time and WAL stream location in case this
-		 * snapshot becomes old enough to need to fall back on the special
-		 * "old snapshot" logic.
-		 */
-		snapshot->lsn = GetXLogInsertRecPtr();
-		snapshot->whenTaken = GetSnapshotCurrentTimestamp();
-		MaintainOldSnapshotTimeMapping(snapshot->whenTaken, xmin);
-	}
+	snapshot->too_old = false;
+	snapshot->expire_time = 0;
 
 	return snapshot;
 }
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8958ec8103..856fb2c5a7 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3204,6 +3204,12 @@ ProcessInterrupts(void)
 
 	}
 
+	if (SnapshotTimeoutPending)
+	{
+		HandleSnapshotTimeout();
+		SnapshotTimeoutPending = false;
+	}
+
 	if (ProcSignalBarrierPending)
 		ProcessProcSignalBarrier();
 
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index eb19644419..0bf0dafac5 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -32,6 +32,7 @@ volatile sig_atomic_t QueryCancelPending = false;
 volatile sig_atomic_t ProcDiePending = false;
 volatile sig_atomic_t ClientConnectionLost = false;
 volatile sig_atomic_t IdleInTransactionSessionTimeoutPending = false;
+volatile sig_atomic_t SnapshotTimeoutPending = false;
 volatile sig_atomic_t ProcSignalBarrierPending = false;
 volatile uint32 InterruptHoldoffCount = 0;
 volatile uint32 QueryCancelHoldoffCount = 0;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index f4247ea70d..4b9f1128e0 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -73,6 +73,7 @@ static void ShutdownPostgres(int code, Datum arg);
 static void StatementTimeoutHandler(void);
 static void LockTimeoutHandler(void);
 static void IdleInTransactionSessionTimeoutHandler(void);
+static void SnapshotTimeoutHandler(void);
 static bool ThereIsAtLeastOneRole(void);
 static void process_startup_options(Port *port, bool am_superuser);
 static void process_settings(Oid databaseid, Oid roleid);
@@ -631,6 +632,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 		RegisterTimeout(LOCK_TIMEOUT, LockTimeoutHandler);
 		RegisterTimeout(IDLE_IN_TRANSACTION_SESSION_TIMEOUT,
 						IdleInTransactionSessionTimeoutHandler);
+		RegisterTimeout(SNAPSHOT_TIMEOUT, SnapshotTimeoutHandler);
 	}
 
 	/*
@@ -1241,6 +1243,14 @@ IdleInTransactionSessionTimeoutHandler(void)
 	SetLatch(MyLatch);
 }
 
+static void
+SnapshotTimeoutHandler(void)
+{
+	SnapshotTimeoutPending = true;
+	InterruptPending = true;
+	SetLatch(MyLatch);
+}
+
 /*
  * Returns true if at least one role is defined in this database cluster.
  */
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 1c063c592c..4b40f596e0 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -67,6 +67,7 @@
 #include "utils/resowner_private.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
+#include "utils/timeout.h"
 
 
 /*
@@ -74,61 +75,6 @@
  */
 int			old_snapshot_threshold; /* number of minutes, -1 disables */
 
-/*
- * Structure for dealing with old_snapshot_threshold implementation.
- */
-typedef struct OldSnapshotControlData
-{
-	/*
-	 * Variables for old snapshot handling are shared among processes and are
-	 * only allowed to move forward.
-	 */
-	slock_t		mutex_current;	/* protect current_timestamp */
-	TimestampTz current_timestamp;	/* latest snapshot timestamp */
-	slock_t		mutex_latest_xmin;	/* protect latest_xmin and next_map_update */
-	TransactionId latest_xmin;	/* latest snapshot xmin */
-	TimestampTz next_map_update;	/* latest snapshot valid up to */
-	slock_t		mutex_threshold;	/* protect threshold fields */
-	TimestampTz threshold_timestamp;	/* earlier snapshot is old */
-	TransactionId threshold_xid;	/* earlier xid may be gone */
-
-	/*
-	 * Keep one xid per minute for old snapshot error handling.
-	 *
-	 * Use a circular buffer with a head offset, a count of entries currently
-	 * used, and a timestamp corresponding to the xid at the head offset.  A
-	 * count_used value of zero means that there are no times stored; a
-	 * count_used value of OLD_SNAPSHOT_TIME_MAP_ENTRIES means that the buffer
-	 * is full and the head must be advanced to add new entries.  Use
-	 * timestamps aligned to minute boundaries, since that seems less
-	 * surprising than aligning based on the first usage timestamp.  The
-	 * latest bucket is effectively stored within latest_xmin.  The circular
-	 * buffer is updated when we get a new xmin value that doesn't fall into
-	 * the same interval.
-	 *
-	 * It is OK if the xid for a given time slot is from earlier than
-	 * calculated by adding the number of minutes corresponding to the
-	 * (possibly wrapped) distance from the head offset to the time of the
-	 * head entry, since that just results in the vacuuming of old tuples
-	 * being slightly less aggressive.  It would not be OK for it to be off in
-	 * the other direction, since it might result in vacuuming tuples that are
-	 * still expected to be there.
-	 *
-	 * Use of an SLRU was considered but not chosen because it is more
-	 * heavyweight than is needed for this, and would probably not be any less
-	 * code to implement.
-	 *
-	 * Persistence is not needed.
-	 */
-	int			head_offset;	/* subscript of oldest tracked time */
-	TimestampTz head_timestamp; /* time corresponding to head xid */
-	int			count_used;		/* how many slots are in use */
-	TransactionId xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
-} OldSnapshotControlData;
-
-static volatile OldSnapshotControlData *oldSnapshotControl;
-
-
 /*
  * CurrentSnapshot points to the only snapshot taken in transaction-snapshot
  * mode, and to the latest one taken in a read-committed transaction.
@@ -189,9 +135,6 @@ typedef struct ActiveSnapshotElt
 /* Top of the stack of active snapshots */
 static ActiveSnapshotElt *ActiveSnapshot = NULL;
 
-/* Bottom of the stack of active snapshots */
-static ActiveSnapshotElt *OldestActiveSnapshot = NULL;
-
 /*
  * Currently registered Snapshots.  Ordered in a heap by xmin, so that we can
  * quickly find the one with lowest xmin, to advance our MyPgXact->xmin.
@@ -225,10 +168,10 @@ typedef struct ExportedSnapshot
 static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
-static TimestampTz AlignTimestampToMinuteBoundary(TimestampTz ts);
 static Snapshot CopySnapshot(Snapshot snapshot);
 static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
+static void SetSnapshotExpireTime(Snapshot snapshot);
 
 /*
  * Snapshot fields to be serialized.
@@ -245,54 +188,8 @@ typedef struct SerializedSnapshotData
 	bool		suboverflowed;
 	bool		takenDuringRecovery;
 	CommandId	curcid;
-	TimestampTz whenTaken;
-	XLogRecPtr	lsn;
 } SerializedSnapshotData;
 
-Size
-SnapMgrShmemSize(void)
-{
-	Size		size;
-
-	size = offsetof(OldSnapshotControlData, xid_by_minute);
-	if (old_snapshot_threshold > 0)
-		size = add_size(size, mul_size(sizeof(TransactionId),
-									   OLD_SNAPSHOT_TIME_MAP_ENTRIES));
-
-	return size;
-}
-
-/*
- * Initialize for managing old snapshot detection.
- */
-void
-SnapMgrInit(void)
-{
-	bool		found;
-
-	/*
-	 * Create or attach to the OldSnapshotControlData structure.
-	 */
-	oldSnapshotControl = (volatile OldSnapshotControlData *)
-		ShmemInitStruct("OldSnapshotControlData",
-						SnapMgrShmemSize(), &found);
-
-	if (!found)
-	{
-		SpinLockInit(&oldSnapshotControl->mutex_current);
-		oldSnapshotControl->current_timestamp = 0;
-		SpinLockInit(&oldSnapshotControl->mutex_latest_xmin);
-		oldSnapshotControl->latest_xmin = InvalidTransactionId;
-		oldSnapshotControl->next_map_update = 0;
-		SpinLockInit(&oldSnapshotControl->mutex_threshold);
-		oldSnapshotControl->threshold_timestamp = 0;
-		oldSnapshotControl->threshold_xid = InvalidTransactionId;
-		oldSnapshotControl->head_offset = 0;
-		oldSnapshotControl->head_timestamp = 0;
-		oldSnapshotControl->count_used = 0;
-	}
-}
-
 /*
  * GetTransactionSnapshot
  *		Get the appropriate snapshot for a new query in a transaction.
@@ -353,6 +250,7 @@ GetTransactionSnapshot(void)
 			/* Mark it as "registered" in FirstXactSnapshot */
 			FirstXactSnapshot->regd_count++;
 			pairingheap_add(&RegisteredSnapshots, &FirstXactSnapshot->ph_node);
+			SetSnapshotExpireTime(FirstXactSnapshot);
 		}
 		else
 			CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
@@ -403,36 +301,6 @@ GetLatestSnapshot(void)
 	return SecondarySnapshot;
 }
 
-/*
- * GetOldestSnapshot
- *
- *		Get the transaction's oldest known snapshot, as judged by the LSN.
- *		Will return NULL if there are no active or registered snapshots.
- */
-Snapshot
-GetOldestSnapshot(void)
-{
-	Snapshot	OldestRegisteredSnapshot = NULL;
-	XLogRecPtr	RegisteredLSN = InvalidXLogRecPtr;
-
-	if (!pairingheap_is_empty(&RegisteredSnapshots))
-	{
-		OldestRegisteredSnapshot = pairingheap_container(SnapshotData, ph_node,
-														 pairingheap_first(&RegisteredSnapshots));
-		RegisteredLSN = OldestRegisteredSnapshot->lsn;
-	}
-
-	if (OldestActiveSnapshot != NULL)
-	{
-		XLogRecPtr	ActiveLSN = OldestActiveSnapshot->as_snap->lsn;
-
-		if (XLogRecPtrIsInvalid(RegisteredLSN) || RegisteredLSN > ActiveLSN)
-			return OldestActiveSnapshot->as_snap;
-	}
-
-	return OldestRegisteredSnapshot;
-}
-
 /*
  * GetCatalogSnapshot
  *		Get a snapshot that is sufficiently up-to-date for scan of the
@@ -493,7 +361,8 @@ GetNonHistoricCatalogSnapshot(Oid relid)
 		 * CatalogSnapshot pointer is already valid.
 		 */
 		pairingheap_add(&RegisteredSnapshots, &CatalogSnapshot->ph_node);
-	}
+		SetSnapshotExpireTime(CatalogSnapshot);
+	 }
 
 	return CatalogSnapshot;
 }
@@ -513,7 +382,8 @@ InvalidateCatalogSnapshot(void)
 {
 	if (CatalogSnapshot)
 	{
-		pairingheap_remove(&RegisteredSnapshots, &CatalogSnapshot->ph_node);
+		if (!CatalogSnapshot->too_old)
+			pairingheap_remove(&RegisteredSnapshots, &CatalogSnapshot->ph_node);
 		CatalogSnapshot = NULL;
 		SnapshotResetXmin();
 	}
@@ -646,6 +516,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
 		/* Mark it as "registered" in FirstXactSnapshot */
 		FirstXactSnapshot->regd_count++;
 		pairingheap_add(&RegisteredSnapshots, &FirstXactSnapshot->ph_node);
+		SetSnapshotExpireTime(FirstXactSnapshot);
 	}
 
 	FirstSnapshotSet = true;
@@ -680,6 +551,9 @@ CopySnapshot(Snapshot snapshot)
 	newsnap->active_count = 0;
 	newsnap->copied = true;
 
+	newsnap->too_old = snapshot->too_old;
+	newsnap->expire_time = snapshot->expire_time;
+
 	/* setup XID array */
 	if (snapshot->xcnt > 0)
 	{
@@ -755,8 +629,8 @@ PushActiveSnapshot(Snapshot snap)
 	newactive->as_snap->active_count++;
 
 	ActiveSnapshot = newactive;
-	if (OldestActiveSnapshot == NULL)
-		OldestActiveSnapshot = ActiveSnapshot;
+
+	SetSnapshotExpireTime(newactive->as_snap);
 }
 
 /*
@@ -827,8 +701,6 @@ PopActiveSnapshot(void)
 
 	pfree(ActiveSnapshot);
 	ActiveSnapshot = newstack;
-	if (ActiveSnapshot == NULL)
-		OldestActiveSnapshot = NULL;
 
 	SnapshotResetXmin();
 }
@@ -891,7 +763,10 @@ RegisterSnapshotOnOwner(Snapshot snapshot, ResourceOwner owner)
 	ResourceOwnerRememberSnapshot(owner, snap);
 
 	if (snap->regd_count == 1)
+	{
 		pairingheap_add(&RegisteredSnapshots, &snap->ph_node);
+		SetSnapshotExpireTime(snap);
+	}
 
 	return snap;
 }
@@ -928,7 +803,7 @@ UnregisterSnapshotFromOwner(Snapshot snapshot, ResourceOwner owner)
 	ResourceOwnerForgetSnapshot(owner, snapshot);
 
 	snapshot->regd_count--;
-	if (snapshot->regd_count == 0)
+	if (snapshot->regd_count == 0 && !snapshot->too_old)
 		pairingheap_remove(&RegisteredSnapshots, &snapshot->ph_node);
 
 	if (snapshot->regd_count == 0 && snapshot->active_count == 0)
@@ -998,13 +873,6 @@ GetFullRecentGlobalXmin(void)
  * dropped.  For efficiency, we only consider recomputing PGXACT->xmin when
  * the active snapshot stack is empty; this allows us not to need to track
  * which active snapshot is oldest.
- *
- * Note: it's tempting to use GetOldestSnapshot() here so that we can include
- * active snapshots in the calculation.  However, that compares by LSN not
- * xmin so it's not entirely clear that it's the same thing.  Also, we'd be
- * critically dependent on the assumption that the bottommost active snapshot
- * stack entry has the oldest xmin.  (Current uses of GetOldestSnapshot() are
- * not actually critical, but this would be.)
  */
 static void
 SnapshotResetXmin(void)
@@ -1076,8 +944,6 @@ AtSubAbort_Snapshot(int level)
 		pfree(ActiveSnapshot);
 
 		ActiveSnapshot = next;
-		if (ActiveSnapshot == NULL)
-			OldestActiveSnapshot = NULL;
 	}
 
 	SnapshotResetXmin();
@@ -1099,7 +965,7 @@ AtEOXact_Snapshot(bool isCommit, bool resetXmin)
 	 * stacked as active, we don't want the code below to be chasing through a
 	 * dangling pointer.
 	 */
-	if (FirstXactSnapshot != NULL)
+	if (FirstXactSnapshot != NULL && !FirstXactSnapshot->too_old)
 	{
 		Assert(FirstXactSnapshot->regd_count > 0);
 		Assert(!pairingheap_is_empty(&RegisteredSnapshots));
@@ -1133,6 +999,9 @@ AtEOXact_Snapshot(bool isCommit, bool resetXmin)
 				elog(WARNING, "could not unlink file \"%s\": %m",
 					 esnap->snapfile);
 
+			if (esnap->snapshot->too_old)
+				continue;
+
 			pairingheap_remove(&RegisteredSnapshots,
 							   &esnap->snapshot->ph_node);
 		}
@@ -1161,7 +1030,6 @@ AtEOXact_Snapshot(bool isCommit, bool resetXmin)
 	 * it'll go away with TopTransactionContext.
 	 */
 	ActiveSnapshot = NULL;
-	OldestActiveSnapshot = NULL;
 	pairingheap_reset(&RegisteredSnapshots);
 
 	CurrentSnapshot = NULL;
@@ -1262,6 +1130,7 @@ ExportSnapshot(Snapshot snapshot)
 
 	snapshot->regd_count++;
 	pairingheap_add(&RegisteredSnapshots, &snapshot->ph_node);
+	SetSnapshotExpireTime(snapshot);
 
 	/*
 	 * Fill buf with a text serialization of the snapshot, plus identification
@@ -1696,335 +1565,6 @@ ThereAreNoPriorRegisteredSnapshots(void)
 }
 
 
-/*
- * Return a timestamp that is exactly on a minute boundary.
- *
- * If the argument is already aligned, return that value, otherwise move to
- * the next minute boundary following the given time.
- */
-static TimestampTz
-AlignTimestampToMinuteBoundary(TimestampTz ts)
-{
-	TimestampTz retval = ts + (USECS_PER_MINUTE - 1);
-
-	return retval - (retval % USECS_PER_MINUTE);
-}
-
-/*
- * Get current timestamp for snapshots
- *
- * This is basically GetCurrentTimestamp(), but with a guarantee that
- * the result never moves backward.
- */
-TimestampTz
-GetSnapshotCurrentTimestamp(void)
-{
-	TimestampTz now = GetCurrentTimestamp();
-
-	/*
-	 * Don't let time move backward; if it hasn't advanced, use the old value.
-	 */
-	SpinLockAcquire(&oldSnapshotControl->mutex_current);
-	if (now <= oldSnapshotControl->current_timestamp)
-		now = oldSnapshotControl->current_timestamp;
-	else
-		oldSnapshotControl->current_timestamp = now;
-	SpinLockRelease(&oldSnapshotControl->mutex_current);
-
-	return now;
-}
-
-/*
- * Get timestamp through which vacuum may have processed based on last stored
- * value for threshold_timestamp.
- *
- * XXX: So far, we never trust that a 64-bit value can be read atomically; if
- * that ever changes, we could get rid of the spinlock here.
- */
-TimestampTz
-GetOldSnapshotThresholdTimestamp(void)
-{
-	TimestampTz threshold_timestamp;
-
-	SpinLockAcquire(&oldSnapshotControl->mutex_threshold);
-	threshold_timestamp = oldSnapshotControl->threshold_timestamp;
-	SpinLockRelease(&oldSnapshotControl->mutex_threshold);
-
-	return threshold_timestamp;
-}
-
-static void
-SetOldSnapshotThresholdTimestamp(TimestampTz ts, TransactionId xlimit)
-{
-	SpinLockAcquire(&oldSnapshotControl->mutex_threshold);
-	oldSnapshotControl->threshold_timestamp = ts;
-	oldSnapshotControl->threshold_xid = xlimit;
-	SpinLockRelease(&oldSnapshotControl->mutex_threshold);
-}
-
-/*
- * TransactionIdLimitedForOldSnapshots
- *
- * Apply old snapshot limit, if any.  This is intended to be called for page
- * pruning and table vacuuming, to allow old_snapshot_threshold to override
- * the normal global xmin value.  Actual testing for snapshot too old will be
- * based on whether a snapshot timestamp is prior to the threshold timestamp
- * set in this function.
- */
-TransactionId
-TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
-									Relation relation)
-{
-	if (TransactionIdIsNormal(recentXmin)
-		&& old_snapshot_threshold >= 0
-		&& RelationAllowsEarlyPruning(relation))
-	{
-		TimestampTz ts = GetSnapshotCurrentTimestamp();
-		TransactionId xlimit = recentXmin;
-		TransactionId latest_xmin;
-		TimestampTz update_ts;
-		bool		same_ts_as_threshold = false;
-
-		SpinLockAcquire(&oldSnapshotControl->mutex_latest_xmin);
-		latest_xmin = oldSnapshotControl->latest_xmin;
-		update_ts = oldSnapshotControl->next_map_update;
-		SpinLockRelease(&oldSnapshotControl->mutex_latest_xmin);
-
-		/*
-		 * Zero threshold always overrides to latest xmin, if valid.  Without
-		 * some heuristic it will find its own snapshot too old on, for
-		 * example, a simple UPDATE -- which would make it useless for most
-		 * testing, but there is no principled way to ensure that it doesn't
-		 * fail in this way.  Use a five-second delay to try to get useful
-		 * testing behavior, but this may need adjustment.
-		 */
-		if (old_snapshot_threshold == 0)
-		{
-			if (TransactionIdPrecedes(latest_xmin, MyPgXact->xmin)
-				&& TransactionIdFollows(latest_xmin, xlimit))
-				xlimit = latest_xmin;
-
-			ts -= 5 * USECS_PER_SEC;
-			SetOldSnapshotThresholdTimestamp(ts, xlimit);
-
-			return xlimit;
-		}
-
-		ts = AlignTimestampToMinuteBoundary(ts)
-			- (old_snapshot_threshold * USECS_PER_MINUTE);
-
-		/* Check for fast exit without LW locking. */
-		SpinLockAcquire(&oldSnapshotControl->mutex_threshold);
-		if (ts == oldSnapshotControl->threshold_timestamp)
-		{
-			xlimit = oldSnapshotControl->threshold_xid;
-			same_ts_as_threshold = true;
-		}
-		SpinLockRelease(&oldSnapshotControl->mutex_threshold);
-
-		if (!same_ts_as_threshold)
-		{
-			if (ts == update_ts)
-			{
-				xlimit = latest_xmin;
-				if (NormalTransactionIdFollows(xlimit, recentXmin))
-					SetOldSnapshotThresholdTimestamp(ts, xlimit);
-			}
-			else
-			{
-				LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
-
-				if (oldSnapshotControl->count_used > 0
-					&& ts >= oldSnapshotControl->head_timestamp)
-				{
-					int			offset;
-
-					offset = ((ts - oldSnapshotControl->head_timestamp)
-							  / USECS_PER_MINUTE);
-					if (offset > oldSnapshotControl->count_used - 1)
-						offset = oldSnapshotControl->count_used - 1;
-					offset = (oldSnapshotControl->head_offset + offset)
-						% OLD_SNAPSHOT_TIME_MAP_ENTRIES;
-					xlimit = oldSnapshotControl->xid_by_minute[offset];
-
-					if (NormalTransactionIdFollows(xlimit, recentXmin))
-						SetOldSnapshotThresholdTimestamp(ts, xlimit);
-				}
-
-				LWLockRelease(OldSnapshotTimeMapLock);
-			}
-		}
-
-		/*
-		 * Failsafe protection against vacuuming work of active transaction.
-		 *
-		 * This is not an assertion because we avoid the spinlock for
-		 * performance, leaving open the possibility that xlimit could advance
-		 * and be more current; but it seems prudent to apply this limit.  It
-		 * might make pruning a tiny bit less aggressive than it could be, but
-		 * protects against data loss bugs.
-		 */
-		if (TransactionIdIsNormal(latest_xmin)
-			&& TransactionIdPrecedes(latest_xmin, xlimit))
-			xlimit = latest_xmin;
-
-		if (NormalTransactionIdFollows(xlimit, recentXmin))
-			return xlimit;
-	}
-
-	return recentXmin;
-}
-
-/*
- * Take care of the circular buffer that maps time to xid.
- */
-void
-MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
-{
-	TimestampTz ts;
-	TransactionId latest_xmin;
-	TimestampTz update_ts;
-	bool		map_update_required = false;
-
-	/* Never call this function when old snapshot checking is disabled. */
-	Assert(old_snapshot_threshold >= 0);
-
-	ts = AlignTimestampToMinuteBoundary(whenTaken);
-
-	/*
-	 * Keep track of the latest xmin seen by any process. Update mapping with
-	 * a new value when we have crossed a bucket boundary.
-	 */
-	SpinLockAcquire(&oldSnapshotControl->mutex_latest_xmin);
-	latest_xmin = oldSnapshotControl->latest_xmin;
-	update_ts = oldSnapshotControl->next_map_update;
-	if (ts > update_ts)
-	{
-		oldSnapshotControl->next_map_update = ts;
-		map_update_required = true;
-	}
-	if (TransactionIdFollows(xmin, latest_xmin))
-		oldSnapshotControl->latest_xmin = xmin;
-	SpinLockRelease(&oldSnapshotControl->mutex_latest_xmin);
-
-	/* We only needed to update the most recent xmin value. */
-	if (!map_update_required)
-		return;
-
-	/* No further tracking needed for 0 (used for testing). */
-	if (old_snapshot_threshold == 0)
-		return;
-
-	/*
-	 * We don't want to do something stupid with unusual values, but we don't
-	 * want to litter the log with warnings or break otherwise normal
-	 * processing for this feature; so if something seems unreasonable, just
-	 * log at DEBUG level and return without doing anything.
-	 */
-	if (whenTaken < 0)
-	{
-		elog(DEBUG1,
-			 "MaintainOldSnapshotTimeMapping called with negative whenTaken = %ld",
-			 (long) whenTaken);
-		return;
-	}
-	if (!TransactionIdIsNormal(xmin))
-	{
-		elog(DEBUG1,
-			 "MaintainOldSnapshotTimeMapping called with xmin = %lu",
-			 (unsigned long) xmin);
-		return;
-	}
-
-	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
-
-	Assert(oldSnapshotControl->head_offset >= 0);
-	Assert(oldSnapshotControl->head_offset < OLD_SNAPSHOT_TIME_MAP_ENTRIES);
-	Assert((oldSnapshotControl->head_timestamp % USECS_PER_MINUTE) == 0);
-	Assert(oldSnapshotControl->count_used >= 0);
-	Assert(oldSnapshotControl->count_used <= OLD_SNAPSHOT_TIME_MAP_ENTRIES);
-
-	if (oldSnapshotControl->count_used == 0)
-	{
-		/* set up first entry for empty mapping */
-		oldSnapshotControl->head_offset = 0;
-		oldSnapshotControl->head_timestamp = ts;
-		oldSnapshotControl->count_used = 1;
-		oldSnapshotControl->xid_by_minute[0] = xmin;
-	}
-	else if (ts < oldSnapshotControl->head_timestamp)
-	{
-		/* old ts; log it at DEBUG */
-		LWLockRelease(OldSnapshotTimeMapLock);
-		elog(DEBUG1,
-			 "MaintainOldSnapshotTimeMapping called with old whenTaken = %ld",
-			 (long) whenTaken);
-		return;
-	}
-	else if (ts <= (oldSnapshotControl->head_timestamp +
-					((oldSnapshotControl->count_used - 1)
-					 * USECS_PER_MINUTE)))
-	{
-		/* existing mapping; advance xid if possible */
-		int			bucket = (oldSnapshotControl->head_offset
-							  + ((ts - oldSnapshotControl->head_timestamp)
-								 / USECS_PER_MINUTE))
-		% OLD_SNAPSHOT_TIME_MAP_ENTRIES;
-
-		if (TransactionIdPrecedes(oldSnapshotControl->xid_by_minute[bucket], xmin))
-			oldSnapshotControl->xid_by_minute[bucket] = xmin;
-	}
-	else
-	{
-		/* We need a new bucket, but it might not be the very next one. */
-		int			advance = ((ts - oldSnapshotControl->head_timestamp)
-							   / USECS_PER_MINUTE);
-
-		oldSnapshotControl->head_timestamp = ts;
-
-		if (advance >= OLD_SNAPSHOT_TIME_MAP_ENTRIES)
-		{
-			/* Advance is so far that all old data is junk; start over. */
-			oldSnapshotControl->head_offset = 0;
-			oldSnapshotControl->count_used = 1;
-			oldSnapshotControl->xid_by_minute[0] = xmin;
-		}
-		else
-		{
-			/* Store the new value in one or more buckets. */
-			int			i;
-
-			for (i = 0; i < advance; i++)
-			{
-				if (oldSnapshotControl->count_used == OLD_SNAPSHOT_TIME_MAP_ENTRIES)
-				{
-					/* Map full and new value replaces old head. */
-					int			old_head = oldSnapshotControl->head_offset;
-
-					if (old_head == (OLD_SNAPSHOT_TIME_MAP_ENTRIES - 1))
-						oldSnapshotControl->head_offset = 0;
-					else
-						oldSnapshotControl->head_offset = old_head + 1;
-					oldSnapshotControl->xid_by_minute[old_head] = xmin;
-				}
-				else
-				{
-					/* Extend map to unused entry. */
-					int			new_tail = (oldSnapshotControl->head_offset
-											+ oldSnapshotControl->count_used)
-					% OLD_SNAPSHOT_TIME_MAP_ENTRIES;
-
-					oldSnapshotControl->count_used++;
-					oldSnapshotControl->xid_by_minute[new_tail] = xmin;
-				}
-			}
-		}
-	}
-
-	LWLockRelease(OldSnapshotTimeMapLock);
-}
-
-
 /*
  * Setup a snapshot that replaces normal catalog snapshots that allows catalog
  * access to behave just like it did at a certain point in the past.
@@ -2113,8 +1653,6 @@ SerializeSnapshot(Snapshot snapshot, char *start_address)
 	serialized_snapshot.suboverflowed = snapshot->suboverflowed;
 	serialized_snapshot.takenDuringRecovery = snapshot->takenDuringRecovery;
 	serialized_snapshot.curcid = snapshot->curcid;
-	serialized_snapshot.whenTaken = snapshot->whenTaken;
-	serialized_snapshot.lsn = snapshot->lsn;
 
 	/*
 	 * Ignore the SubXID array if it has overflowed, unless the snapshot was
@@ -2187,8 +1725,6 @@ RestoreSnapshot(char *start_address)
 	snapshot->suboverflowed = serialized_snapshot.suboverflowed;
 	snapshot->takenDuringRecovery = serialized_snapshot.takenDuringRecovery;
 	snapshot->curcid = serialized_snapshot.curcid;
-	snapshot->whenTaken = serialized_snapshot.whenTaken;
-	snapshot->lsn = serialized_snapshot.lsn;
 
 	/* Copy XIDs, if present. */
 	if (serialized_snapshot.xcnt > 0)
@@ -2212,6 +1748,9 @@ RestoreSnapshot(char *start_address)
 	snapshot->active_count = 0;
 	snapshot->copied = true;
 
+	snapshot->too_old = false;
+	snapshot->expire_time = 0;
+
 	return snapshot;
 }
 
@@ -2348,3 +1887,102 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
 
 	return false;
 }
+
+/*
+ * Find any snapshots that are too old and mark them as such, and see if we can
+ * advance this backend's xmin.
+ */
+void HandleSnapshotTimeout(void)
+{
+	TransactionId		xmin = InvalidTransactionId;
+	TimestampTz			now = GetCurrentTimestamp();
+	TimestampTz			next_expire_time = 0;
+
+	/*
+	 * See if we can mark any registered snapshots as too old, and unregister
+	 * them.
+	 *
+	 * XXX This assumes that two registered snapshots with same xmin are sorted
+	 * in expire_time order, which isn't true right now.
+	 */
+	while (!pairingheap_is_empty(&RegisteredSnapshots))
+	{
+		Snapshot snapshot;
+
+		snapshot = pairingheap_container(SnapshotData, ph_node,
+										 pairingheap_first(&RegisteredSnapshots));
+		if (snapshot->expire_time == 0 || snapshot->expire_time > now)
+		{
+			xmin = snapshot->xmin;
+			next_expire_time = snapshot->expire_time;
+			break;
+		}
+
+		/* Forcibly deregister the snapshot so that it no longer holds up xmin. */
+		snapshot->too_old = true;
+		pairingheap_remove_first(&RegisteredSnapshots);
+	}
+
+	/*
+	 * Also scan the active snapshots, seeing if any need to be marked too old,
+	 * and find the new xmin.
+	 */
+	for (ActiveSnapshotElt *s = ActiveSnapshot; s; s = s->as_next)
+	{
+		if (s->as_snap->too_old)
+			continue;
+		else if (s->as_snap->expire_time > 0 && s->as_snap->expire_time <= now)
+			 s->as_snap->too_old = true;
+		else
+		{
+			/* See if we have a new lowest xmin. */
+			if (xmin == InvalidTransactionId ||
+				TransactionIdPrecedes(s->as_snap->xmin, xmin))
+				xmin = s->as_snap->xmin;
+			/* See if we have a new earliest expire_time. */
+			if (next_expire_time == 0 ||
+				s->as_snap->expire_time < next_expire_time)
+				next_expire_time = s->as_snap->expire_time;
+		}
+	}
+
+	/* Set timer for the next snapshot that will expire. */
+	if (next_expire_time > 0)
+		enable_timeout_after(SNAPSHOT_TIMEOUT, (now - next_expire_time) / 1000);
+
+	/* Advertise new xmin. */
+	MyPgXact->xmin = xmin;
+}
+
+/*
+ * SetSnapshotExpireTime
+ *		Configure the time at which a snapshot becomes too old.
+ *
+ * This should be called whenever a snapshot is made active or registered.
+ */
+static void
+SetSnapshotExpireTime(Snapshot snapshot)
+{
+	/*
+	 * If this snapshot was already marked as too old (perhaps because it was
+	 * copied from one that was too old), then don't clobber the flag.
+	 */
+	if (snapshot->too_old)
+		return;
+
+	snapshot->too_old = false;
+	if (old_snapshot_threshold < 0)
+	{
+		/* Feature not enabled. */
+		snapshot->expire_time = 0;
+	}
+	else
+	{
+		TimestampTz		now = GetCurrentTimestamp();
+
+		/* Set up a timer that will call HandleSnapshotTimeout. */
+		snapshot->expire_time = now + old_snapshot_threshold * 1000;
+		enable_timeout_after(SNAPSHOT_TIMEOUT,
+							 (snapshot->expire_time - now) / 1000);
+	}
+}
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 14fa127ab1..1f866302b6 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -82,6 +82,7 @@ extern PGDLLIMPORT volatile sig_atomic_t InterruptPending;
 extern PGDLLIMPORT volatile sig_atomic_t QueryCancelPending;
 extern PGDLLIMPORT volatile sig_atomic_t ProcDiePending;
 extern PGDLLIMPORT volatile sig_atomic_t IdleInTransactionSessionTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t SnapshotTimeoutPending;
 extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
 
 extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index ee91b8fa26..d7fb0dbb6f 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -238,8 +238,6 @@ extern bool BgBufferSync(struct WritebackContext *wb_context);
 
 extern void AtProcExit_LocalBuffers(void);
 
-extern void TestForOldSnapshot_impl(Snapshot snapshot, Relation relation);
-
 /* in freelist.c */
 extern BufferAccessStrategy GetAccessStrategy(BufferAccessStrategyType btype);
 extern void FreeAccessStrategy(BufferAccessStrategy strategy);
@@ -269,22 +267,16 @@ extern void FreeAccessStrategy(BufferAccessStrategy strategy);
  * Note that a NULL snapshot argument is allowed and causes a fast return
  * without error; this is to support call sites which can be called from
  * either scans or index modification areas.
- *
- * For best performance, keep the tests that are fastest and/or most likely to
- * exclude a page from old snapshot testing near the front.
  */
 static inline void
 TestForOldSnapshot(Snapshot snapshot, Relation relation, Page page)
 {
 	Assert(relation != NULL);
 
-	if (old_snapshot_threshold >= 0
-		&& (snapshot) != NULL
-		&& ((snapshot)->snapshot_type == SNAPSHOT_MVCC
-			|| (snapshot)->snapshot_type == SNAPSHOT_TOAST)
-		&& !XLogRecPtrIsInvalid((snapshot)->lsn)
-		&& PageGetLSN(page) > (snapshot)->lsn)
-		TestForOldSnapshot_impl(snapshot, relation);
+	if (snapshot && snapshot->too_old)
+		ereport(ERROR,
+				(errcode(ERRCODE_SNAPSHOT_TOO_OLD),
+				 errmsg("snapshot too old")));
 }
 
 #endif							/* FRONTEND */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index b28d13ce84..cf7cd197e9 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -19,40 +19,10 @@
 #include "utils/snapshot.h"
 
 
-/*
- * The structure used to map times to TransactionId values for the "snapshot
- * too old" feature must have a few entries at the tail to hold old values;
- * otherwise the lookup will often fail and the expected early pruning or
- * vacuum will not usually occur.  It is best if this padding is for a number
- * of minutes greater than a thread would normally be stalled, but it's OK if
- * early vacuum opportunities are occasionally missed, so there's no need to
- * use an extreme value or get too fancy.  10 minutes seems plenty.
- */
-#define OLD_SNAPSHOT_PADDING_ENTRIES 10
-#define OLD_SNAPSHOT_TIME_MAP_ENTRIES (old_snapshot_threshold + OLD_SNAPSHOT_PADDING_ENTRIES)
-
-/*
- * Common definition of relation properties that allow early pruning/vacuuming
- * when old_snapshot_threshold >= 0.
- */
-#define RelationAllowsEarlyPruning(rel) \
-( \
-	 RelationNeedsWAL(rel) \
-  && !IsCatalogRelation(rel) \
-  && !RelationIsAccessibleInLogicalDecoding(rel) \
-)
-
-#define EarlyPruningEnabled(rel) (old_snapshot_threshold >= 0 && RelationAllowsEarlyPruning(rel))
-
 /* GUC variables */
 extern PGDLLIMPORT int old_snapshot_threshold;
 
 
-extern Size SnapMgrShmemSize(void);
-extern void SnapMgrInit(void);
-extern TimestampTz GetSnapshotCurrentTimestamp(void);
-extern TimestampTz GetOldSnapshotThresholdTimestamp(void);
-
 extern bool FirstSnapshotSet;
 
 extern PGDLLIMPORT TransactionId TransactionXmin;
@@ -74,7 +44,8 @@ extern PGDLLIMPORT SnapshotData CatalogSnapshotData;
  * local variable of type SnapshotData, and initialize it with this macro.
  */
 #define InitDirtySnapshot(snapshotdata)  \
-	((snapshotdata).snapshot_type = SNAPSHOT_DIRTY)
+	((snapshotdata).snapshot_type = SNAPSHOT_DIRTY, \
+	 (snapshotdata).too_old = false)
 
 /*
  * Similarly, some initialization is required for a NonVacuumable snapshot.
@@ -82,16 +53,15 @@ extern PGDLLIMPORT SnapshotData CatalogSnapshotData;
  */
 #define InitNonVacuumableSnapshot(snapshotdata, xmin_horizon)  \
 	((snapshotdata).snapshot_type = SNAPSHOT_NON_VACUUMABLE, \
-	 (snapshotdata).xmin = (xmin_horizon))
+	 (snapshotdata).xmin = (xmin_horizon), \
+	 (snapshotdata).too_old = false)
 
 /*
- * Similarly, some initialization is required for SnapshotToast.  We need
- * to set lsn and whenTaken correctly to support snapshot_too_old.
+ * Similarly, some initialization is required for SnapshotToast.
  */
-#define InitToastSnapshot(snapshotdata, l, w)  \
+#define InitToastSnapshot(snapshotdata)  \
 	((snapshotdata).snapshot_type = SNAPSHOT_TOAST, \
-	 (snapshotdata).lsn = (l),					\
-	 (snapshotdata).whenTaken = (w))
+	 (snapshotdata).too_old = false)
 
 /* This macro encodes the knowledge of which snapshots are MVCC-safe */
 #define IsMVCCSnapshot(snapshot)  \
@@ -102,7 +72,8 @@ extern PGDLLIMPORT SnapshotData CatalogSnapshotData;
 extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
-extern Snapshot GetOldestSnapshot(void);
+
+extern void HandleSnapshotTimeout(void);
 
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index 4796edb63a..e701c8b9b1 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -199,8 +199,8 @@ typedef struct SnapshotData
 	uint32		regd_count;		/* refcount on RegisteredSnapshots */
 	pairingheap_node ph_node;	/* link in the RegisteredSnapshots heap */
 
-	TimestampTz whenTaken;		/* timestamp when snapshot was taken */
-	XLogRecPtr	lsn;			/* position in the WAL stream when taken */
+	TimestampTz	expire_time;	/* when will this snapshot become too old? */
+	bool		too_old;
 } SnapshotData;
 
 #endif							/* SNAPSHOT_H */
diff --git a/src/include/utils/timeout.h b/src/include/utils/timeout.h
index 83a15f6795..eb705a3f8f 100644
--- a/src/include/utils/timeout.h
+++ b/src/include/utils/timeout.h
@@ -31,6 +31,7 @@ typedef enum TimeoutId
 	STANDBY_TIMEOUT,
 	STANDBY_LOCK_TIMEOUT,
 	IDLE_IN_TRANSACTION_SESSION_TIMEOUT,
+	SNAPSHOT_TIMEOUT,
 	/* First user-definable timeout reason */
 	USER_TIMEOUT,
 	/* Maximum number of timeout reasons */
-- 
2.20.1

Reply via email to