On Tue, Aug 31, 2021 at 12:45 PM Andres Freund <and...@anarazel.de> wrote: > > Hi, > > On 2021-08-31 11:37:08 +0900, Masahiko Sawada wrote: > > At step #2, wal sender waits for another transaction started at step > > #1 to complete after creating the replication slot. When the server is > > stopping, wal sender process drops the slot on releasing the slot > > since it's still RS_EPHEMERAL. Then, after dropping the slot we report > > the message for dropping the slot (see ReplicationSlotDropPtr()). > > These are executed in ReplicationSlotRelease() called by ProcKill() > > which is called during calling on_shmem_exit callbacks, which is after > > shutting down pgstats during before_shmem_exit callbacks. I’ve not > > tested yet but I think this can potentially happen also when dropping > > a temporary slot. ProcKill() also calls ReplicationSlotCleanup() to > > clean up temporary slots. > > > > There are some ideas to fix this issue but I don’t think it’s a good > > idea to move either ProcKill() or the slot releasing code to > > before_shmem_exit in this case, like we did for other similar > > issues[1][2]. > > Yea, that's clearly not an option. > > I wonder why the replication slot stuff is in ProcKill() rather than its > own callback. That's probably my fault, but I don't remember what lead > to that. > > > > Reporting the slot dropping message on dropping the slot > > isn’t necessarily essential actually since autovacuums periodically > > check already-dropped slots and report to drop the stats. So another > > idea would be to move pgstat_report_replslot_drop() to a higher layer > > such as ReplicationSlotDrop() and ReplicationSlotsDropDBSlots() that > > are not called during callbacks. The replication slot stats are > > dropped when it’s dropped via commands such as > > pg_drop_replication_slot() and DROP_REPLICATION_SLOT. On the other > > hand, for temporary slots and ephemeral slots, we rely on autovacuums > > to drop their stats. Even if we delay to drop the stats for those > > slots, pg_stat_replication_slots don’t show the stats for > > already-dropped slots. > > Yea, we could do that, but I think it'd be nicer to find a bit more > principled solution... > > Perhaps moving this stuff out from ProcKill() into its own > before_shmem_exit() callback would do the trick?
You mean to move only the part of sending the message to its own before_shmem_exit() callback? or move ReplicationSlotRelease() and ReplicationSlotCleanup() from ProcKill() to it? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/