A long time ago, in a galaxy far away, we discussed ways to speed up
data loads/COPY.
http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php

In particular, the idea that we could mark tuples as committed while
we are still loading them, to avoid negative behaviour for the first
reader.

Simple patch to implement this is attached, together with test case.

Current behaviour is shown here
Run COPY and then... SELECT count(*) FROM table with no indexes
1st SELECT Time: 1518.571 ms <--- slowed dramatically by setting hint bits
2nd SELECT Time: 914.141 ms
3rd SELECT Time: 914.921 ms

With this patch I observed the following results
1st SELECT Time: 890.820 ms
2nd SELECT Time: 884.799 ms
3rd SELECT Time: 882.405 ms

What exactly does it do? Previously, we optimised COPY when it was
loading data into a newly created table or a freshly truncated table.
This patch extends that and actually sets the tuple header flag as
HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
code. The patch also adds some tests for corner cases that would make
that action break MVCC - though those cases are minor and typical data
loads will benefit fully from this.

In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC()
and adding current xid to snapshots. That is an invasive change that I
would wish to avoid at any time and explains the long delay in
tackling this. The way I've implemented it, is just as a short test
during XidInMVCCSnapshot() so that we trap the case when the xid ==
xmax and so would appear to be running. This is much less invasive and
just as performant as Tom's original suggestion.

Why do we need this now? Setting checksums on page requires us to
write WAL for hints, so the situation of the 1st SELECT after a load
would get somewhat worse when page_checksums are enabled, but we
already know there is a price. However, this is a situation we can
solve, and add value for all cases, not just when checksums enabled.
So I'm posting this as a separate patch rather than including that as
a tuning feature of the checksums patch.

Your input will be generously received,

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c910863..3899282 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2056,6 +2056,17 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 	tup->t_tableOid = RelationGetRelid(relation);
 
 	/*
+	 * If we are inserting into a new relation invisible as yet to other
+	 * backends and our session has no prior snapshots and no ready portals
+	 * then we can also set the hint bit for the rows we are inserting. The
+	 * last two restrictions ensure that HeapTupleSatisfiesMVCC() gives
+	 * the right answer if the current transaction inspects the data after
+	 * we load it.
+	 */
+	if (options & HEAP_INSERT_HINT_XMIN)
+		tup->t_data->t_infomask |= HEAP_XMIN_COMMITTED;
+
+	/*
 	 * If the new tuple is too big for storage or contains already toasted
 	 * out-of-line attributes from some other relation, invoke the toaster.
 	 */
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 110480f..6a60eb8 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -43,6 +43,7 @@
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/portal.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -1922,6 +1923,10 @@ CopyFrom(CopyState cstate)
 		hi_options |= HEAP_INSERT_SKIP_FSM;
 		if (!XLogIsNeeded())
 			hi_options |= HEAP_INSERT_SKIP_WAL;
+
+		if (ThereAreNoPriorRegisteredSnapshots() &&
+			ThereAreNoReadyPortals())
+			hi_options |= HEAP_INSERT_HINT_XMIN;
 	}
 
 	/*
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index cfb73c1..24075db 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -1055,3 +1055,22 @@ pg_cursor(PG_FUNCTION_ARGS)
 
 	return (Datum) 0;
 }
+
+bool
+ThereAreNoReadyPortals(void)
+{
+	HASH_SEQ_STATUS status;
+	PortalHashEnt *hentry;
+
+	hash_seq_init(&status, PortalHashTable);
+
+	while ((hentry = (PortalHashEnt *) hash_seq_search(&status)) != NULL)
+	{
+		Portal		portal = hentry->portal;
+
+		if (portal->status == PORTAL_READY)
+			return false;
+	}
+
+	return true;
+}
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 5aebbd1..5d9e3bf 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1183,3 +1183,12 @@ DeleteAllExportedSnapshotFiles(void)
 
 	FreeDir(s_dir);
 }
+
+bool
+ThereAreNoPriorRegisteredSnapshots(void)
+{
+	if (RegisteredSnapshots <= 1)
+		return true;
+
+	return false;
+}
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 31791a7..a2e5524 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -1256,7 +1256,18 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
 	if (TransactionIdPrecedes(xid, snapshot->xmin))
 		return false;
 	/* Any xid >= xmax is in-progress */
-	if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
+	if (TransactionIdFollows(xid, snapshot->xmax))
+		return true;
+	/*
+	 * Make sure current xid is never thought to be in progress on tuples
+	 * that are already marked as committed - for use in bulk COPY etc..
+	 * We never included the current xid in snapshots, but if it happens
+	 * to be xmax it could still be returned as in-progress.
+	 */
+	if (TransactionIdIsCurrentTransactionId(xid))
+		return false;
+	/* All non-current xids >= xmax are seen as still running */
+	if (xid == snapshot->xmax)
 		return true;
 
 	/*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index fa38803..0381785 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -26,6 +26,7 @@
 /* "options" flag bits for heap_insert */
 #define HEAP_INSERT_SKIP_WAL	0x0001
 #define HEAP_INSERT_SKIP_FSM	0x0002
+#define HEAP_INSERT_HINT_XMIN	0x0004
 
 typedef struct BulkInsertStateData *BulkInsertState;
 
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 4833942..bd2b133 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -219,5 +219,6 @@ extern void PortalDefineQuery(Portal portal,
 extern Node *PortalListGetPrimaryStmt(List *stmts);
 extern void PortalCreateHoldStore(Portal portal);
 extern void PortalHashTableDeleteAll(void);
+extern bool ThereAreNoReadyPortals(void);
 
 #endif   /* PORTAL_H */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index f195981..789b72e 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -46,5 +46,6 @@ extern Datum pg_export_snapshot(PG_FUNCTION_ARGS);
 extern void ImportSnapshot(const char *idstr);
 extern bool XactHasExportedSnapshots(void);
 extern void DeleteAllExportedSnapshotFiles(void);
+extern bool ThereAreNoPriorRegisteredSnapshots(void);
 
 #endif   /* SNAPMGR_H */
\timing on
\! pgbench -i -s 5
\copy pgbench_accounts to '/tmp/accounts.dat'
BEGIN;
drop table if exists acctest;
create table acctest (like pgbench_accounts);
\copy acctest FROM '/tmp/accounts.dat'
select * from acctest limit 1;
commit;

select * from acctest limit 1;
SELECT count(*) FROm acctest;
SELECT count(*) FROm acctest;
SELECT count(*) FROm acctest;
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to