On Tue, Mar 23, 2021 at 3:09 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Mon, Mar 22, 2021 at 12:20 PM Masahiko Sawada <sawada.m...@gmail.com> > wrote: > > > > On Mon, Mar 22, 2021 at 1:25 PM Masahiko Sawada <sawada.m...@gmail.com> > > wrote: > > > > > > On Sat, Mar 20, 2021 at 3:52 AM Andres Freund <and...@anarazel.de> wrote: > > > > > > > > - If max_replication_slots was lowered between a restart, > > > > pgstat_read_statfile() will happily write beyond the end of > > > > replSlotStats. > > > > > > I think we cannot restart the server after lowering > > > max_replication_slots to a value less than the number of replication > > > slots actually created on the server. No? > > > > This problem happens in the case where max_replication_slots is > > lowered and there still are stats for a slot. > > > > I think this can happen only if the drop message is lost, right?
Yes, I think you're right. In that case, the stats file could have more slots statistics than the lowered max_replication_slots. > > > I understood the risk of running out of replSlotStats. If we use the > > index in replSlotStats instead, IIUC we need to somehow synchronize > > the indexes in between replSlotStats and > > ReplicationSlotCtl->replication_slots. The order of replSlotStats is > > preserved across restarting whereas the order of > > ReplicationSlotCtl->replication_slots isn’t (readdir() that is used by > > StartupReplicationSlots() doesn’t guarantee the order of the returned > > entries in the directory). Maybe we can compare the slot name in the > > received message to the name in the element of replSlotStats. If they > > don’t match, we swap entries in replSlotStats to synchronize the index > > of the replication slot in ReplicationSlotCtl->replication_slots and > > replSlotStats. If we cannot find the entry in replSlotStats that has > > the name in the received message, it probably means either it's a new > > slot or the previous create message is dropped, we can create the new > > stats for the slot. Is that what you mean, Andres? > > > > I wonder how in this scheme, we will remove the risk of running out of > 'replSlotStats' and still restore correct stats assuming the drop > message is lost? Do we want to check after restoring each slot info > whether the slot with that name exists? Yeah, I think we need such a check at least if the number of slot stats in the stats file is larger than max_replication_slots. Or we can do that at every startup to remove orphaned slot stats. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/