Hi, On 2021-08-31 11:37:08 +0900, Masahiko Sawada wrote: > At step #2, wal sender waits for another transaction started at step > #1 to complete after creating the replication slot. When the server is > stopping, wal sender process drops the slot on releasing the slot > since it's still RS_EPHEMERAL. Then, after dropping the slot we report > the message for dropping the slot (see ReplicationSlotDropPtr()). > These are executed in ReplicationSlotRelease() called by ProcKill() > which is called during calling on_shmem_exit callbacks, which is after > shutting down pgstats during before_shmem_exit callbacks. I’ve not > tested yet but I think this can potentially happen also when dropping > a temporary slot. ProcKill() also calls ReplicationSlotCleanup() to > clean up temporary slots. > > There are some ideas to fix this issue but I don’t think it’s a good > idea to move either ProcKill() or the slot releasing code to > before_shmem_exit in this case, like we did for other similar > issues[1][2].
Yea, that's clearly not an option. I wonder why the replication slot stuff is in ProcKill() rather than its own callback. That's probably my fault, but I don't remember what lead to that. > Reporting the slot dropping message on dropping the slot > isn’t necessarily essential actually since autovacuums periodically > check already-dropped slots and report to drop the stats. So another > idea would be to move pgstat_report_replslot_drop() to a higher layer > such as ReplicationSlotDrop() and ReplicationSlotsDropDBSlots() that > are not called during callbacks. The replication slot stats are > dropped when it’s dropped via commands such as > pg_drop_replication_slot() and DROP_REPLICATION_SLOT. On the other > hand, for temporary slots and ephemeral slots, we rely on autovacuums > to drop their stats. Even if we delay to drop the stats for those > slots, pg_stat_replication_slots don’t show the stats for > already-dropped slots. Yea, we could do that, but I think it'd be nicer to find a bit more principled solution... Perhaps moving this stuff out from ProcKill() into its own before_shmem_exit() callback would do the trick? Greetings, Andres Freund