I spent a couple of hours on this patchset today. I merged 0001 and 0002, and decided the result was still messier than I would have liked, so I played with it a bit more -- see attached. I think this is committable, but I'm afraid it'll cause quite a few conflicts with the rest of your series.
I had two gripes, which I feel solved with my changes: 1. I didn't like that "dir" and "wal segment size" were part of the "currently open segment" supporting struct. It seemed that those two were slightly higher-level, since they apply to every segment that's going to be opened, not just the current one. My first thought was to put those as members of XLogReaderState, but that doesn't work because the physical walsender.c code does not use xlogreader at all, even though it is reading WAL. Anyway my solution was to create yet another struct, which for everything that uses xlogreader is just part of that state struct; and for walsender, it's just a separate one alongside sendSeg. All in all, this seems pretty clean. 2. Having the wal dir be #ifdef FRONTEND seemed out of place. I know the backend code does not use that, but eliding it is more "noisy" than just setting it to NULL. Also, the "Finalize the segment pointer" thingy seemed out of place. So my code passes the dir as an argument to XLogReaderAllocate, and if it's null then we just don't allocate it. Everybody else can use it to guide things. This results in cleaner code, because we don't have to handle it externally, which was causing quite some pain to pg_waldump. Note that ws_dir member is a char array in the struct, not just a pointer. This saves trouble trying to allocate it (I mainly did it this way because we don't have pstrdup_extended(MCXT_ALLOC_NO_OOM) ... yes, this could be made with palloc+snprintf, but eh, that doesn't seem worth the trouble.) Separately from those two API-wise points, there was one bug which meant that with your 0002+0003 the recovery tests did not pass -- code placement bug. I suppose the bug disappears with later patches in your series, which probably is why you didn't notice. This is the fix for that: - XLogRead(cur_page, state->seg.size, state->seg.tli, targetPagePtr, - state->seg.tli = pageTLI; + state->seg.ws_tli = pageTLI; + XLogRead(cur_page, state->segcxt.ws_segsize, state->seg.ws_tli, targetPagePtr, XLOG_BLCKSZ); ... Also, yes, I renamed all the struct members. If you don't have any strong dislikes for these changes, I'll push this part and let you rebase the remains on top. Regarding the other patches: 1. I think trying to do palloc(XLogReadError) is a bad idea ... for example, if the read fails because of system pressure, we might return "out of memory" during that palloc instead of the real read error. This particular problem you could forestall by changing to ErrorContext, but I have the impression that it might be better to have the error struct by stack-allocated in the caller stack. This forces you to limit the message string to a maximum size (say 128 bytes or maybe even 1000 bytes like MAX_ERRORMSG_LEN) but I don't have a problem with that. 2. Not a fan of the InvalidTimeLineID stuff offhand. Maybe it's okay ... not convinced yet either way. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services