On Mon, Feb 24, 2014 at 10:11 AM, Andres Freund <and...@2ndquadrant.com> wrote: > Changes in this version include: > * changed slot error handling log by introducing "ephermal" slots which > get dropped on errors. This is the biggest change. > * added quoting in the test_decoding output plugin > * closing of a tight race condition during slot creation where WAL could > have been removed > * comment and other adjustments, many of them noticed by robert
I did another read-through of this this afternoon, focusing on the stuff you changed and parts I hadn't looked at carefully yet. Comments below. Documentation needs to be updated for pg_stat_replication view. I still think pg_create_logical_replication_slot should be in slotfuncs.c. /* Size of an indirect datum that contains an indirect TOAST pointer */ #define INDIRECT_POINTER_SIZE (VARHDRSZ_EXTERNAL + sizeof(struct varatt_indirect)) +/* Size of an indirect datum that contains a standard TOAST pointer */ +#define INDIRECT_POINTER_SIZE (VARHDRSZ_EXTERNAL + sizeof(struct varatt_indirect)) Isn't the new hunk a duplicate of the existing definition, except for a one-word change to the comment? I don't think the completely-unsecured directory operations in test_decoding_regsupport.c are acceptable. Tom fought tooth and nail to make sure that similar capabilities in adminpack carried meaningful security restrictions. /* + * Check whether there are, possibly unconnected, logical slots that refer + * to the to-be-dropped database. The database lock we are holding + * prevents the creation of new slots using the database. + */ + if (ReplicationSlotsCountDBSlots(db_id, &nslots, &nslots_active)) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_IN_USE), + errmsg("database \"%s\" is used in a logical decoding slot", + dbname), + errdetail("There are %d slot(s), %d of them active", + nslots, nslots_active))); What are you going to do when we get around to supporting this on a standby? Whatever the answer is, maybe add a TODO comment. + * loop for now.. + * more than twice.. Extra periods. + * The replicatio slot mechanism is used to prevent removal of required Typo. + + /* + * GetRunningTransactionData() acquired ProcArrayLock, we must release + * it. We can do that before inserting the WAL record because + * ProcArrayApplyRecoveryInfo can recheck the commit status using the + * clog. If we're doing logical replication we can't do that though, so + * hold the lock for a moment longer. + */ + if (wal_level < WAL_LEVEL_LOGICAL) + LWLockRelease(ProcArrayLock); + recptr = LogCurrentRunningXacts(running); + /* Release lock if we kept it longer ... */ + if (wal_level >= WAL_LEVEL_LOGICAL) + LWLockRelease(ProcArrayLock); + This seems unfortunate. The comment should clearly explain why it's necessary. + /* + * Startup logical state, needs to be setup now so we have proper data + * during restore. + */ + StartupReorderBuffer(); Should add blank line before this. + CheckPointSnapBuild(); + CheckpointLogicalRewriteHeap(); Shouldn't the capitalization be consistent? - heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalXmin); + if (IsSystemRelation(scan->rs_rd) + || RelationIsAccessibleInLogicalDecoding(scan->rs_rd)) + heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalXmin); + else + heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalDataXmin); Instead of changing the callers of heap_page_prune_opt() in this way, I think it might be better to change heap_page_prune_opt() to take only the first two of its current three parameters; everybody's just passing RecentGlobalXmin right now anyway. Then, we could change the first check in heap_page_prune_opt() to check first whether PageIsPrunable(page, RecentGlobalDataXmin). If not, give up. If so, then check that (!IsSystemRelation(scan->rs_rd) && !RelationIsAccessibleInLogicalDecoding(scan->rs_rd)) || PageIsPrunable(page, RecentGlobalXmin)). The advantage of this is that we avoid code duplication, and we avoid checking a couple of conditions if pd_prune_xmin is very recent. - if (nrels > 0 || nmsgs > 0 || RelcacheInitFileInval || forceSyncCommit) + if (nrels > 0 || nmsgs > 0 || RelcacheInitFileInval || forceSyncCommit || + XLogLogicalInfoActive()) Mmph. Is this really necessary? If so, why? The comments could elucidate. + bool fail_softly = slot->data.persistency == RS_EPHEMERAL; This should be contingent on whether we're being called in the error pathway, not the slot type. I think you should pass a bool. There are a bunch of places where you're testing IsSystemRelation() || RelationIsAccessibleInLogicalDecoding(). Maybe we need a macro encapsulating that test, with a name chose to explain the point of it. It seems to be indicating, roughly, whether the relation should participate in RecentGlobalXmin or RecentGlobalDataXmin. But is there any point at all of separating those when !XLogLogicalInfoActive()? The test expands to: IsSystemRelation() || (XLogLogicalInfoActive() && RelationNeedsWAL(relation) && (IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation))) So basically this is all tables created in pg_catalog during initdb plus all TOAST tables in the system. If wal_level=logical, then we also include tables marked with the reloption user_catalog_table=true, unless they're unlogged. This all seems a bit complex. Why not this: IsSystemRelation() || || RelationIsUsedAsCatalogTable(relation) And why not this? IsCatalogRelation() || || RelationIsUsedAsCatalogTable(relation) i.e. is it really necessary to include all TOAST tables, or does it suffice to include TOAST tables of system catalogs? I bet you're going to tell me that we don't know which TOAST tables pertain to user-catalog tables, and thus must include them all. Ugh. + /* + * It's important *not* to track decoding tasks here because + * snapbuild.c uses ->oldestRunningXid to manage its xmin. If it + * were to be included here the initial value could never + * increase. + */ This is not clear, and it uses the " ->member" syntax which I find confusing and inelegant. lazy_vacuum_rel() takes the relation as an argument, so why does it need the to caller to compute IsSystemRelation(onerel) || RelationIsAccessibleInLogicalDecoding(onerel)? Header comment for ReplicationSlotDropAcquired() is bogus. ReplicationSlotDropAcquired() can easily avoid using a "goto" with a short else block. I'd suggest if rename() == 0 then fsync() else ereport(). + * GetOldestSafeDecodingTransactionId -- lowest xid not affected by vacuum It seems to me that this is the lowest XID known not to have been pruned, whether by vacuum or otherwise. + /* ---- + * This is a bit tricky: We need to determine a safe xmin horizon to start + * decoding from, to avoid starting from a running xacts record referring + * to xids whose rows have been vacuumed or pruned + * already. GetOldestSafeDecodingTransactionId() returns such a value, but + * without further interlock it's return value might immediately be out of + * date. + * + * So we have to acquire the ProcArrayLock to prevent computation of new + * xmin horizons by other backends, get the safe decoding xid, and inform + * the slot machinery about the new limit. Once that's done the + * ProcArrayLock can be be released as the slot machinery now is + * protecting against vacuum. + * ---- + */ I can't claim to be very excited about this. I'm assuming you've spent a lot of time thinking about ways to avoid this and utterly failed to come up with any reasonable alternative, but let me take a shot. Suppose we take ProcArrayLock in exclusive mode and compute the oldest running XID, install it as our xmin, and then release ProcArrayLock. At that point, nobody else can compute an oldest-xmin value that precedes that value, so we can take our time installing that value as the slot's xmin, without needing to hold a lock meanwhile. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers