On Sat, Oct 23, 2021 at 1:14 PM Michael Paquier <mich...@paquier.xyz> wrote: > > On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote: > > Done. I haven't touched the timeline switch test patch for now, but I still > > include it here for completeness. > > Thanks. I have applied and back-patched 0001, then looked again at > 0002 that adds READ_REPLICATION_SLOT: > - Change the TLI to use int8 rather than int4, so as we will always be > right with TimelineID which is unsigned (this was discussed upthread > but I got back on it after more thoughts, to avoid any future > issues). > - Added an extra initialization for the set of Datum values, just as > an extra safety net. > - There was a bug with the timeline returned when executing the > command while in recovery as ThisTimeLineID is 0 in the context of a > standby, but we need to support the case of physical slots even when > streaming archives from a standby. The fix is similar to what we do > for IDENTIFY_SYSTEM, where we need to use the timeline currently > replayed from GetXLogReplayRecPtr(), before looking at the past > timeline history using restart_lsn and the replayed TLI. > > With that in place, I think that we are good now for this part.
Thanks for the updated patch. I have following comments on v10: 1) It's better to initialize nulls with false, we can avoid setting them to true. The instances where the columns are not nulls is going to be more than the columns with null values, so we could avoid some of nulls[i] = false; instructions. + MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool)); I suggest we do the following. The number of instances of hitting the "else" parts will be less. MemSet(nulls, false, READ_REPLICATION_SLOT_COLS * sizeof(bool)); if (slot == NULL || !slot->in_use) { MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool)); LWLockRelease(ReplicationSlotControlLock); } if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)) { } else nulls[i] = true; if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)) { } else nulls[i] = true; 2) As I previously mentioned, we are not copying the slot contents while holding the spinlock, it's just we are taking the memory address and releasing the lock, so there is a chance that the memory we are looking at can become unavailable or stale while we access slot_contents. So, I suggest we do the memcpy of the *slot to slot_contents. I'm sure the memcpy ing the entire ReplicationSlot contents will be costlier, so let's just take the info that we need (data.database, data.restart_lsn) into local variables while we hold the spin lock + /* Copy slot contents while holding spinlock */ + SpinLockAcquire(&slot->mutex); + slot_contents = *slot; + SpinLockRelease(&slot->mutex); + LWLockRelease(ReplicationSlotControlLock); The code will look like following: Oid database; XLogRecPtr restart_lsn; /* Take required information from slot contents while holding spinlock */ SpinLockAcquire(&slot->mutex); database= slot->data.database; restart_lsn= slot->data.restart_lsn; SpinLockRelease(&slot->mutex); LWLockRelease(ReplicationSlotControlLock); 3) The data that the new command returns to the client can actually become stale while it is captured and in transit to the client as we release the spinlock and other backends can drop or alter the info. So, it's better we talk about this in the documentation of the new command and also in the comments saying "clients will have to deal with it." 4) How about we be more descriptive about the error added? This will help identify for which replication slot the command has failed from tons of server logs which really help in debugging and analysis. I suggest we have this: errmsg("cannot use \"%s\" command with logical replication slot \"%s\"", "READ_REPLICATION_SLOT", cmd->slotname); instead of just a plain, non-informative, generic message: errmsg("cannot use \"%s\" with logical replication slots", Regards, Bharath Rupireddy.