Hi, On 2021-10-19 15:13:04 -0400, Robert Haas wrote: > This is a followup to > http://postgr.es/m/ca+tgmoz5a26c6oxkapafyuy_sx0vg6vxdd_q6asezsvrphd...@mail.gmail.com. > I'm suspicious of the following code in CreateReplicationSlot: > > /* setup state for WalSndSegmentOpen */ > sendTimeLineIsHistoric = false; > sendTimeLine = ThisTimeLineID; > > The first thing that's odd about this is that if this is physical > replication, it's apparently dead code, because AFAICT sendTimeLine > will not be used for anything in that case.
It's quite confusing. It's *really* not helped by physical replication using but not really using an xlogreader to keep state. Which presumably isn't actually used during a physical CreateReplicationSlot(), but is referenced by a comment :/ > But I don't know if it matters. We call CreateInitDecodingContext() > with sendTimeLine and ThisTimeLineID still zero; it doesn't call any > callbacks. Then we call DecodingContextFindStartpoint() with > sendTimeLine still 0 and the first callback that gets invoked is > logical_read_xlog_page. At this point sendTimeLine = 0 and > ThisTimeLineID = 0. That calls XLogReadDetermineTimeline() which > resets ThisTimeLineID to the correct value of 2, but when we get back > to logical_read_xlog_page, we still manage to call WALRead with a > timeline of 0 because state->seg.ws_tli is still 0. And when WALRead > eventually does call WalSndOpen, which unconditionally propagates > sendTimeLine into the TLI pointer that is passed to it. So now > state->seg_ws_tli also ends up being 2. So I guess maybe nothing bad > happens? But it sure seems strange that the code would apparently work > just as well as it does today with the following patch: > > diff --git a/src/backend/replication/walsender.c > b/src/backend/replication/walsender.c > index b811a5c0ef..44fd598519 100644 > --- a/src/backend/replication/walsender.c > +++ b/src/backend/replication/walsender.c > @@ -945,7 +945,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd) > > /* setup state for WalSndSegmentOpen */ > sendTimeLineIsHistoric = false; > - sendTimeLine = ThisTimeLineID; > + sendTimeLine = rand() % 10; > > if (cmd->kind == REPLICATION_KIND_PHYSICAL) > { Istm we should introduce an InvalidTimeLineID, and explicitly initialize sendTimeLine to that, and assert that it's valid / invalid in a bunch of places? Greetings, Andres Freund