On Tue, Dec 26, 2017 at 10:03 PM, Petr Jelinek <petr.jeli...@2ndquadrant.com> wrote: > On 26/12/17 11:13, Masahiko Sawada wrote: >> On Tue, Dec 26, 2017 at 12:49 AM, Petr Jelinek >> <petr.jeli...@2ndquadrant.com> wrote: >> >>>> >>>> It's not a problem on crash restart because StartupReorderBuffer already >>>> does the required delete. >>>> >>>> ReorderBufferSerializeTXN, which spills the txns to disk, doesn't appear >>>> to have any guard to ensure that the segment files don't already exist >>>> when it goes to serialize a snapshot. Adding one there would probably be >>>> expensive; we don't know the last lsn of the txn yet, so to be really >>>> safe we'd have to do a directory listing and scan for any txn-$OURXID-* >>>> entries. >>>> >>>> So to fix, I suggest that we should do a >>>> slot-specific StartupReorderBuffer-style deletion when we start a new >>>> decoding session on a slot, per attached patch. >>>> >>>> It might be nice to also add a hook on proc exit, so we don't have stale >>>> buffers lying around until next decoding session, but I didn't want to >>>> add new global state to reorderbuffer.c so I've left that for now. >>> >>> >>> Hmm, can't we simply call the new cleanup function in >>> ReplicationSlotRelease()? That's called during process exit and we know >>> there about slot so no extra global variables are needed. >>> >> >> I guess that ReplicationSlotRelease() currently might not get called >> if walsender exits by proc_exit(). ReplicationSlotRelease() can is >> called by some functions such as WalSndErrorCleanup(), but at least in >> the case where wal sender exits due to failed to write data to socket, >> ReplicationSlotRelease() didn't get called as far as I tested. >> > > Are you sure about that testing? Did you test it with replication slot > active? ReplicationSlotRelease() is called from ProcKill() if the > process is using a slot and should be called for any kind of exit except > for outright crash (the kind that makes postgres kill all backends). If > it wasn't we'd never unlock the replication slot used by the exiting > walsender. >
Ah, sorry I was wrong. WalSndErrorCleanup() didn't get called but ReplicationSlotRelease() got called. I agree that cleanup function gets called in ReplicationSlotRelease(). Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center