On Sun, Mar 4, 2012 at 2:28 AM, Marti Raudsepp <ma...@juffo.org> wrote: > On Sat, Mar 3, 2012 at 14:53, Simon Riggs <si...@2ndquadrant.com> wrote: >> Thanks Noah for drawing attention to this thread. I hadn't been >> watching. As you say, this work would allow me to freeze rows at load >> time and avoid the overhead of hint bit setting, which avoids >> performance issues from hint bit setting in checksum patch. >> >> I've reviewed this patch and it seems OK to me. Good work Marti. > > Thanks! This approach wasn't universally liked, but if it gets us > tangible benefits (COPY with frozenxid) then I guess it's a reason to > reconsider.
Comments I see support this idea. If we did this purely for truncate correctness I probably wouldn't care either, which is why I didn't read this thread in the first place. >> v2 patch attached, updated to latest HEAD. Patch adds >> * relvalidxid is reset by VACUUM/ANALYZE to avoid wraparound failure > > Personally I'd rather keep this out of ANALYZE -- since its purpose is > to collect stats; VACUUM is responsible for correctness (xid > wraparound etc). But I don't feel strongly about this. If there were a reason to do it, then I would agree. Later you point out a reason, so I will make this change. > A more important consideration is how this interacts with hot standby. > Currently you compare OldestXmin to relvalidxmin to decide when to > reset it. But the standby's OldestXmin may be older than the master's. > (If VACUUM removes rows then this causes a recovery conflict, but > AFAICT that won't happen if only relvalidxmin changes) As of 9.1, the standby's oldestxmin is incorporated into the master's via hot_standby_feedback, so it wouldn't typically be a problem these days. It's possible that the standby has this set off by choice, in which case anomalies could exist in the case that a VACUUM doesn't clean any rows, as you say. So we'll use vacrelstats->latestRemovedXid instead of OldestXmin when we call vac_update_relstats() which means ANALYZE should always pass InvalidTransactionId. > It might be more robust to wait until relfrozenxid exceeds > relvalidxmin -- by then, recovery conflict mechanisms will have taken > care of killing all older snapshots, or am I mistaken? It would be better to set it as early as possible to reduce the cost of the test in heap_begin_scan() > And a few typos the code... > > + gettext_noop("When enabled viewing a truncated or newly created table " > + "will throw a serialization error to prevent MVCC " > + "discrepancies that were possible priot to 9.2.") > > "prior" not "priot" Yep > + * Reset relvalidxmin if its old enough > > Should be "it's" in this context. Cool, thanks for the review. v3 attached. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index c910863..4387f49 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -73,7 +73,7 @@ /* GUC variable */ bool synchronize_seqscans = true; - +bool StrictMVCC = true; static HeapScanDesc heap_beginscan_internal(Relation relation, Snapshot snapshot, @@ -1175,6 +1175,24 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot, HeapScanDesc scan; /* + * If the snapshot is older than relvalidxmin then that either a table + * has only recently been created or that a TRUNCATE has removed data + * that we should still be able to see. Either way, we aren't allowed + * to view the rows in StrictMVCC mode. + */ + if (StrictMVCC && + TransactionIdIsValid(relation->rd_rel->relvalidxmin) && + TransactionIdIsValid(snapshot->xmax) && + NormalTransactionIdPrecedes(snapshot->xmax, relation->rd_rel->relvalidxmin)) + { + ereport(ERROR, + (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), + errmsg("canceling statement due to conflict with TRUNCATE TABLE on %s", + NameStr(relation->rd_rel->relname)), + errdetail("User query attempting to see row versions that have been removed."))); + } + + /* * increment relation ref count while scanning relation * * This is just to make really sure the relcache entry won't go away while diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index aef410a..3f9bd5d 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -787,6 +787,7 @@ InsertPgClassTuple(Relation pg_class_desc, values[Anum_pg_class_relhastriggers - 1] = BoolGetDatum(rd_rel->relhastriggers); values[Anum_pg_class_relhassubclass - 1] = BoolGetDatum(rd_rel->relhassubclass); values[Anum_pg_class_relfrozenxid - 1] = TransactionIdGetDatum(rd_rel->relfrozenxid); + values[Anum_pg_class_relvalidxmin - 1] = TransactionIdGetDatum(rd_rel->relvalidxmin); if (relacl != (Datum) 0) values[Anum_pg_class_relacl - 1] = relacl; else @@ -884,6 +885,7 @@ AddNewRelationTuple(Relation pg_class_desc, new_rel_reltup->relfrozenxid = InvalidTransactionId; } + new_rel_reltup->relvalidxmin = InvalidTransactionId; new_rel_reltup->relowner = relowner; new_rel_reltup->reltype = new_type_oid; new_rel_reltup->reloftype = reloftype; diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index bfbe642..0d96a6a 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -2854,7 +2854,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks) } /* We'll build a new physical relation for the index */ - RelationSetNewRelfilenode(iRel, InvalidTransactionId); + RelationSetNewRelfilenode(iRel, InvalidTransactionId, InvalidTransactionId); /* Initialize the index and rebuild */ /* Note: we do not need to re-establish pkey setting */ diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index b40e57b..44b891c 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -538,6 +538,7 @@ do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh) totalrows, visibilitymap_count(onerel), hasindex, + InvalidTransactionId, InvalidTransactionId); /* @@ -558,6 +559,7 @@ do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh) totalindexrows, 0, false, + InvalidTransactionId, InvalidTransactionId); } } diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c index 1f95abc..ab4aec2 100644 --- a/src/backend/commands/sequence.c +++ b/src/backend/commands/sequence.c @@ -287,7 +287,7 @@ ResetSequence(Oid seq_relid) * Create a new storage file for the sequence. We want to keep the * sequence's relfrozenxid at 0, since it won't contain any unfrozen XIDs. */ - RelationSetNewRelfilenode(seq_rel, InvalidTransactionId); + RelationSetNewRelfilenode(seq_rel, InvalidTransactionId, InvalidTransactionId); /* * Insert the modified tuple into the new storage file. diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index cd4490a..aa9d2e9 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -19,6 +19,7 @@ #include "access/reloptions.h" #include "access/relscan.h" #include "access/sysattr.h" +#include "access/transam.h" #include "access/xact.h" #include "catalog/catalog.h" #include "catalog/dependency.h" @@ -1121,7 +1122,7 @@ ExecuteTruncate(TruncateStmt *stmt) * as the relfilenode value. The old storage file is scheduled for * deletion at commit. */ - RelationSetNewRelfilenode(rel, RecentXmin); + RelationSetNewRelfilenode(rel, RecentXmin, GetCurrentTransactionId()); heap_relid = RelationGetRelid(rel); toast_relid = rel->rd_rel->reltoastrelid; @@ -1132,7 +1133,12 @@ ExecuteTruncate(TruncateStmt *stmt) if (OidIsValid(toast_relid)) { rel = relation_open(toast_relid, AccessExclusiveLock); - RelationSetNewRelfilenode(rel, RecentXmin); + RelationSetNewRelfilenode(rel, RecentXmin, InvalidTransactionId); + + /* + * We don't need a cmin here because there can't be any open + * cursors in the same session as the TRUNCATE command. + */ heap_close(rel, NoLock); } diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 353af50..9b36cd7 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -573,7 +573,8 @@ void vac_update_relstats(Relation relation, BlockNumber num_pages, double num_tuples, BlockNumber num_all_visible_pages, - bool hasindex, TransactionId frozenxid) + bool hasindex, TransactionId frozenxid, + TransactionId oldestxmin) { Oid relid = RelationGetRelid(relation); Relation rd; @@ -647,6 +648,21 @@ vac_update_relstats(Relation relation, dirty = true; } + /* + * Reset relvalidxmin if it's old enough. Resetting this value + * could cause MVCC failures on a standby that doesn't have + * hot_standby_feedback enabled, so only reset the value if we + * are certain that all prior snapshots have ended or been + * removed via conflict. Value should always be invalid for indexes. + */ + if (TransactionIdIsValid(oldestxmin) && + TransactionIdIsNormal(pgcform->relvalidxmin) && + TransactionIdPrecedes(pgcform->relvalidxmin, oldestxmin)) + { + pgcform->relvalidxmin = InvalidTransactionId; + dirty = true; + } + /* If anything changed, write out the tuple. */ if (dirty) heap_inplace_update(rd, ctup); diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index 2fc749e..3b88652 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -255,7 +255,8 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt, new_rel_tuples, new_rel_allvisible, vacrelstats->hasindex, - new_frozen_xid); + new_frozen_xid, + vacrelstats->latestRemovedXid); /* report results to the stats collector, too */ pgstat_report_vacuum(RelationGetRelid(onerel), @@ -1185,6 +1186,7 @@ lazy_cleanup_index(Relation indrel, stats->num_index_tuples, 0, false, + InvalidTransactionId, InvalidTransactionId); ereport(elevel, diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index a59950e..17303e8 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -2605,7 +2605,8 @@ RelationBuildLocalRelation(const char *relname, * the XIDs that will be put into the new relation contents. */ void -RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid) +RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid, + TransactionId validxmin) { Oid newrelfilenode; RelFileNodeBackend newrnode; @@ -2674,6 +2675,10 @@ RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid) } classform->relfrozenxid = freezeXid; + if (validxmin != InvalidTransactionId && + TransactionIdPrecedes(classform->relvalidxmin, validxmin)) + classform->relvalidxmin = validxmin; + simple_heap_update(pg_class, &tuple->t_self, tuple); CatalogUpdateIndexes(pg_class, tuple); diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 486bdcd..e4a19aa 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -1430,6 +1430,19 @@ static struct config_bool ConfigureNamesBool[] = }, { + {"strict_mvcc", PGC_POSTMASTER, COMPAT_OPTIONS_PREVIOUS, + gettext_noop("Enables backward compatibility mode of strict MVCC " + "for TRUNCATE and CREATE TABLE."), + gettext_noop("When enabled viewing a truncated or newly created table " + "will throw a serialization error to prevent MVCC " + "discrepancies that were possible prior to 9.2.") + }, + &StrictMVCC, + true, + NULL, NULL, NULL + }, + + { {"quote_all_identifiers", PGC_USERSET, COMPAT_OPTIONS_PREVIOUS, gettext_noop("When generating SQL fragments, quote all identifiers."), NULL, diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 96da086..f56e374 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -548,6 +548,7 @@ #quote_all_identifiers = off #sql_inheritance = on #standard_conforming_strings = on +#strict_mvcc = on #synchronize_seqscans = on # - Other Platforms and Clients - diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h index 0335347..223f157 100644 --- a/src/include/catalog/catversion.h +++ b/src/include/catalog/catversion.h @@ -53,6 +53,6 @@ */ /* yyyymmddN */ -#define CATALOG_VERSION_NO 201203021 +#define CATALOG_VERSION_NO 201203031 #endif diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h index 1567206..d6ee70c 100644 --- a/src/include/catalog/pg_class.h +++ b/src/include/catalog/pg_class.h @@ -67,6 +67,7 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO bool relhastriggers; /* has (or has had) any TRIGGERs */ bool relhassubclass; /* has (or has had) derived classes */ TransactionId relfrozenxid; /* all Xids < this are frozen in this rel */ + TransactionId relvalidxmin; /* data is only valid for snapshots > this Xid */ #ifdef CATALOG_VARLEN /* variable-length fields start here */ /* NOTE: These fields are not present in a relcache entry's rd_rel field. */ @@ -77,7 +78,7 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO /* Size of fixed part of pg_class tuples, not counting var-length fields */ #define CLASS_TUPLE_SIZE \ - (offsetof(FormData_pg_class,relfrozenxid) + sizeof(TransactionId)) + (offsetof(FormData_pg_class,relvalidxmin) + sizeof(TransactionId)) /* ---------------- * Form_pg_class corresponds to a pointer to a tuple with @@ -91,7 +92,7 @@ typedef FormData_pg_class *Form_pg_class; * ---------------- */ -#define Natts_pg_class 27 +#define Natts_pg_class 28 #define Anum_pg_class_relname 1 #define Anum_pg_class_relnamespace 2 #define Anum_pg_class_reltype 3 @@ -117,8 +118,9 @@ typedef FormData_pg_class *Form_pg_class; #define Anum_pg_class_relhastriggers 23 #define Anum_pg_class_relhassubclass 24 #define Anum_pg_class_relfrozenxid 25 -#define Anum_pg_class_relacl 26 -#define Anum_pg_class_reloptions 27 +#define Anum_pg_class_relvalidxmin 26 +#define Anum_pg_class_relacl 27 +#define Anum_pg_class_reloptions 28 /* ---------------- * initial contents of pg_class @@ -130,13 +132,13 @@ typedef FormData_pg_class *Form_pg_class; */ /* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId */ -DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 _null_ _null_ )); +DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 0 _null_ _null_ )); DESCR(""); -DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 _null_ _null_ )); +DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 0 _null_ _null_ )); DESCR(""); -DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 _null_ _null_ )); +DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 0 _null_ _null_ )); DESCR(""); -DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 _null_ _null_ )); +DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 28 0 t f f f f 3 0 _null_ _null_ )); DESCR(""); diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h index 4526648..fa79231 100644 --- a/src/include/commands/vacuum.h +++ b/src/include/commands/vacuum.h @@ -151,7 +151,8 @@ extern void vac_update_relstats(Relation relation, double num_tuples, BlockNumber num_all_visible_pages, bool hasindex, - TransactionId frozenxid); + TransactionId frozenxid, + TransactionId oldestxmin); extern void vacuum_set_xid_limits(int freeze_min_age, int freeze_table_age, bool sharedRel, TransactionId *oldestXmin, diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index 610cb59..9d2be31 100644 --- a/src/include/miscadmin.h +++ b/src/include/miscadmin.h @@ -342,6 +342,8 @@ extern ProcessingMode Mode; #define GetProcessingMode() Mode +/* in access/heap/heapam.c */ +extern bool StrictMVCC; /***************************************************************************** * pinit.h -- * diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h index 2f39486..60c2eb6 100644 --- a/src/include/utils/relcache.h +++ b/src/include/utils/relcache.h @@ -76,7 +76,8 @@ extern Relation RelationBuildLocalRelation(const char *relname, * Routine to manage assignment of new relfilenode to a relation */ extern void RelationSetNewRelfilenode(Relation relation, - TransactionId freezeXid); + TransactionId freezeXid, + TransactionId validxmin); /* * Routines for flushing/rebuilding relcache entries in various scenarios diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule index 669c0f2..bba722a 100644 --- a/src/test/isolation/isolation_schedule +++ b/src/test/isolation/isolation_schedule @@ -14,3 +14,4 @@ test: fk-contention test: fk-deadlock test: fk-deadlock2 test: eval-plan-qual +test: truncate diff --git a/src/test/isolation/expected/truncate.out b/src/test/isolation/expected/truncate.out new file mode 100644 index ...2b9f161 *** a/src/test/isolation/expected/truncate.out --- b/src/test/isolation/expected/truncate.out *************** *** 0 **** --- 1,63 ---- + Parsed test spec with 2 sessions + + starting permutation: begin select roll trunc + step begin: + BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; + -- Force snapshot open + SELECT 1 AS foo FROM txid_current(); + + foo + + 1 + step select: SELECT * FROM foo; + i + + 1 + step roll: ROLLBACK; + step trunc: TRUNCATE TABLE foo; + + starting permutation: begin select trunc roll + step begin: + BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; + -- Force snapshot open + SELECT 1 AS foo FROM txid_current(); + + foo + + 1 + step select: SELECT * FROM foo; + i + + 1 + step trunc: TRUNCATE TABLE foo; <waiting ...> + step roll: ROLLBACK; + step trunc: <... completed> + + starting permutation: begin trunc select roll + step begin: + BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; + -- Force snapshot open + SELECT 1 AS foo FROM txid_current(); + + foo + + 1 + step trunc: TRUNCATE TABLE foo; + step select: SELECT * FROM foo; + ERROR: canceling statement due to conflict with TRUNCATE TABLE on foo + step roll: ROLLBACK; + + starting permutation: trunc begin select roll + step trunc: TRUNCATE TABLE foo; + step begin: + BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; + -- Force snapshot open + SELECT 1 AS foo FROM txid_current(); + + foo + + 1 + step select: SELECT * FROM foo; + i + + step roll: ROLLBACK; diff --git a/src/test/isolation/specs/truncate.spec b/src/test/isolation/specs/truncate.spec new file mode 100644 index ...2c7b707 *** a/src/test/isolation/specs/truncate.spec --- b/src/test/isolation/specs/truncate.spec *************** *** 0 **** --- 1,23 ---- + setup + { + CREATE TABLE foo (i int); + INSERT INTO foo VALUES (1); + } + + teardown + { + DROP TABLE foo; + } + + session "s1" + step "begin" + { + BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; + -- Force snapshot open + SELECT 1 AS foo FROM txid_current(); + } + step "select" { SELECT * FROM foo; } + step "roll" { ROLLBACK; } + + session "s2" + step "trunc" { TRUNCATE TABLE foo; }
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers