At Wed, 28 Jul 2021 12:57:39 +0200, Ronan Dunklau <ronan.dunk...@aiven.io> wrote in > Le mercredi 28 juillet 2021, 08:22:30 CEST Kyotaro Horiguchi a écrit : > > At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau <ronan.dunk...@aiven.io> > > wrote in > > > I don't know if it should be the default, toggled by a command line flag, > > > or if we even should let the user provide a LSN. > > > > *I* think it is completely reasonable (or at least convenient or less > > astonishing) that pg_receivewal starts from the restart_lsn of the > > replication slot to use. The tool already decides the clean-start LSN > > a bit unusual way. And it seems to me that proposed behavior can be > > the default when -S is specified. > > > > As of now we can't get the replication_slot restart_lsn with a replication > connection AFAIK. > > This implies that the patch could require the user to specify a > maintenance-db > parameter, and we would use that if provided to fetch the replication slot > info, or fallback to the previous behaviour. I don't really like this > approach > as the behaviour changing wether we supply a maintenance-db parameter or not > is error-prone for the user. > > Another option would be to add a new replication command (for example > ACQUIRE_REPLICATION_SLOT <slot_name>) to set the replication slot as the > current one, and return some info about it (restart_lsn at least for a > physical slot).
I didn't thought in details. But I forgot that ordinary SQL commands have been prohibited in physical replication connection. So we need a new replication command but it's not that a big deal. > I don't see any reason not to make it work for logical replication > connections > / slots, but it wouldn't be that useful since we can query the database in > that case. Ordinary SQL queries are usable on a logical replication slot so I'm not sure how logical replication connection uses the command. However, like you, I wouldn't bother restricting the command to physical replication, but perhaps the new command should return the slot type. > Acquiring the replication slot instead of just reading it would make sure > that > no other process could start the replication between the time we read the > restart_lsn and when we issue START_REPLICATION. START_REPLICATION could then > check if we already have a replication slot, and ensure it is the same one as > the one we're trying to use. I'm not sure it's worth adding complexity for such strictness. START_REPLICATION safely fails if someone steals the slot meanwhile. In the first place there's no means to protect a slot from others while idle. One possible problem is the case where START_REPLICATION successfully acquire the slot after the new command failed. But that case doesn't seem worse than the case someone advances the slot while absence. So I think READ_REPLICATION_SLOT is sufficient. > From pg_receivewal point of view, this would amount to: > > - check if we currently have wal in the target directory. > - if we do, proceed as currently done, by computing the start lsn and > timeline from the last archived wal > - if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the > restart_lsn as the start lsn if there is one, and don't provide a timeline > - if we still don't have a start_lsn, fallback to using the current server > wal position as is done. That's pretty much it. > What do you think ? Which information should we provide about the slot ? We need the timeline id to start with when using restart_lsn. The current timeline can be used in most cases but there's a case where the LSN is historical. pg_receivewal doesn't send a replication status report when a segment is finished. So after pg_receivewal stops just after a segment is finished, the slot stays at the beginning of the last segment. Thus next time it will start from there, creating a duplicate segment. regards. -- Kyotaro Horiguchi NTT Open Source Software Center