Hi, Mats
On Sun, 24 May 2026 at 20:30, Mats Kindahl <[email protected]> wrote: > On Fri, May 22, 2026 at 12:09 AM surya poondla <[email protected]> > wrote: > > Hi Mats, > > Thanks for picking this up -- the scenario is a real one and I think the > UUID-tagging approach is a clean way to > solve it. v2 applies and builds without trouble, and the core algorithm > reads well to me. > I have a handful of observations that I'd love your thoughts. > > Hi Surya, > > Thank you for the review. It is a quite rare scenario, but it is real and the > fix is simple. > > Regarding Correctness I have the below thoughts > > 1. UUIDv7 timestamp epoch. > In StartupXLOG(): > TimestampTz now = GetCurrentTimestamp(); > generate_uuidv7_r(&uuid_buf, (uint64)(now / 1000), > (uint32)(now % 1000) * 1000); > > I think there might be a small mismatch here: GetCurrentTimestamp() returns > microseconds since the Postgres epoch > (2000-01-01), > whereas generate_uuidv7_r describes its first argument as milliseconds since > the Unix epoch. > As written that 30-year offset would land in the UUID's timestamp field, so > the resulting UUID wouldn't be a > conformant UUIDv7 and wouldn't > time-order against UUIDv7s generated through the SQL functions. > > > > Uniqueness is preserved either way, so the rewind logic still works as > intended but it seemed worth flagging. > > I see conversion that's used elsewhere as: > us = ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) > * SECS_PER_DAY * USECS_PER_SEC; > > Or, since promotion isn't on a hot path, gettimeofday() / time(NULL) > directly would also be fine. > > Yes, the intention was to use a proper timestamp to allow debugging servers > if necessary. Switched to gettimeofday() and > used 0 for sub-ms since this is not going to be critical. (We could use ns > here as well, but that would only solve a race > if you have two servers being promoted in the same ms, which I find unlikely, > and there is a random number added for that > situation.) > > 2. EOR-record path, the intent is unclear. > > The comment above generate_uuidv7_r() at says: > > "The same UUID is written into the history file and later into the > XLOG_END_OF_RECOVERY record so that pg_rewind can > distinguish two servers..." > > But from what I can see only the history-file part actually lands. > xl_end_of_recovery is unchanged, CreateEndOfRecoveryRecord() doesn't add the > UUID, and XLogCtl->ThisTimeLineUUID is > written under info_lck without a > reader (I couldn't grep it). > > The xlog_redo() memset() + Min(rec_len, sizeof(...)) change reads like > preparation for an EOR-struct extension that > ended up not being part of the patch. > > Was the EOR-record piece something you intended to keep for a follow-up, or > has it been superseded by the > history-file approach? > > No, the EOR changes are not needed for the promotion, contrary to what I > originally thought. Cleaned up the comment and > the code and removed all traces of changes to the EOR (I hope). > > > > 3. Malformed UUID handling in readTimeLineHistory(). > > The optional field-4 path is: > > if (nfields == 4 && strlen(uuid_str) == UUID_STR_LEN) > { > Datum datum = DirectFunctionCall1(uuid_in, > CStringGetDatum(uuid_str)); > ... > } > > uuid_in() raises ereport(ERROR) on a malformed input, while the surrounding > syntax-error paths in readTimeLineHistory > () use FATAL deliberately. > In practice an ERROR during startup ends up being fatal too, so this isn't > strictly a bug but it would be nicer to > stay consistent. > > Agree. I added code to capture the error and raise a FATAL instead (with the > error message from the uuid_in, in case it > is modified it makes sense to show this). > > Regarding the Tests I have the following thoughts > > The two new cases are nice, a few extensions that I think would strengthen > them: > 1. A mixed-version case where one side has a zero UUID. That's the path > we're claiming is graceful, but nothing > currently exercises it > > Yes, that should work regardless of whether the source or the target has the > zero UUID. > > I realized one thing: if two timelines have identical TLI but one has zero > UUID and one has not, it seems they could not > come from the same promotion (one promotion happened on an old server and the > other one on a new server), that is, they > should be treated as different. Does that make sense? I made the necessary > changes in the attached patches for testing. > Please have a look. > > 2. A deeper-divergence case (e.g. TLI1->2->3 vs TLI1->2->3') so that > findCommonAncestorTimeline's loop walks past > matching entries > before hitting the mismatch. The 0002 test puts the divergence at depth > 1. > > I was unsure if this test was necessary or interesting, hence a separate > commit. Since you thought it was useful, it's > now rolled into the patch and I extended the tests with the scenarios you > suggested. > > I also did some refactorings of the tests to avoid duplication. More below. > > 3. A small assertion against the on-disk 00000002.history contents, to pin > down the file format. > 4. On 0002 the dependency on restore_command pointing at node_x's pg_wal is > the kind of thing that tends to break > under > environment changes. A CHECKPOINT on node_x before the backup, or > wal_keep_size as in 0001, would let the test > stand on its own. > > Good point. > > I refactored the code to avoid some duplication and make the test flow > self-explanatory and as part of that I set the > wal_keep_size for all nodes. > > In the process I noticed that many of the functions in RewindTest.pm do the > same job as the primitives I wrote, but have > hard-coded variable names. I could rewrite them to take parameters, but that > would be quite a big patch to add additional > changes to each call site, so I did not do that and rather added small > wrappers specific for the tests in > 005_same_timeline.pl⚠️. > > Attached a new version of the now single patch. > > I'm happy to keep reviewing/contributing, thanks again for working on it. > > Thank you for reviewing it. Thank you for your work. I have one comment. + a = &tlh->source[tlh->sourceNentries - 2].tluuid; + b = &tlh->target[tlh->targetNentries - 2].tluuid; + + if (memcmp(a, &zero, UUID_LEN) == 0 && memcmp(b, &zero, UUID_LEN) == 0) + return true; + + return memcmp(a, b, UUID_LEN) == 0; Since we already have matchingTimelineUUID(), the above code can be simplified using it. -- Regards, Japin Li ChengDu WenWu Information Technology Co., Ltd.
