Le jeudi 21 octobre 2021, 07:35:08 CEST Michael Paquier a écrit : > On Wed, Oct 20, 2021 at 02:58:26PM +0200, Ronan Dunklau wrote: > > After sending the previous patch suite, I figured it would be worthwhile > > to > > also have tests covering timeline switches, which was not covered before. > > That seems independent to me. I'll take a look. > > > So please find attached a new version with an additional patch for those > > tests, covering both "resume from last know archive" and "resume from > > the replication slots position" cases. > > So, taking things in order, I have looked at 0003 and 0001, and > attached are refined versions for both of them. > > 0003 is an existing hole in the docs, which I think we had better > address first and backpatch, taking into account that the starting > point calculation considers compressed segments when looking for > completed segments.
Ok, do you want me to propose a different patch for previous versions ? > > Regarding 0001, I have found the last test to check for NULL values > returned by READ_REPLICATION_SLOT after dropping the slot overlaps > with the first test, so I have removed that. I have expanded a bit > the use of like(), and there were some confusion with > PostgresNode::psql and some extra arguments (see DROP_REPLICATION_SLOT > and CREATE_REPLICATION_SLOT, and no need for return values in the > CREATE case either). Some comments, docs and code have been slightly > tweaked. Thank you for this. > > Here are some comments about 0002. > > + /* The commpand should always return precisely one tuple */ > s/commpand/command/ > > + pg_log_error("could not fetch replication slot: got %d rows and %d > fields, expected %d rows and %d or more fields", + > PQntuples(res), PQnfields(res), 1, 3); > Should this be "could not read" instead? > > + if (sscanf(PQgetvalue(res, 0, 1), "%X/%X", &hi, &lo) != 2) > + { > + pg_log_error("could not parse slot's restart_lsn \"%s\"", > + PQgetvalue(res, 0, 1)); > + PQclear(res); > + return false; > + } > Wouldn't it be saner to initialize *restart_lsn and *restart_tli to > some default values at the top of GetSlotInformation() instead, if > they are specified by the caller? Ok. > And I think that we should still > complain even if restart_lsn is NULL. Do you mean restart_lsn as the pointer argument to the function, or restart_lsn as the field returned by the command ? If it's the first, I'll change it but if it's the latter it is expected that we sometime run this on a slot where WAL has never been reserved yet. > > On a quick read of 0004, I find the split of the logic with > change_timeline() a bit hard to understand. It looks like we should > be able to make a cleaner split, but I am not sure how that would > look, though. Thanks, at least if the proposal to test this seems sensible I can move forward. I wanted to avoid having a lot of code duplication since the test setup is a bit more complicated. My first approach was to split it into two functions, setup_standby and change_timeline, but then realized that what would happen between the two invocations would basically be the same for the two test cases, so I ended up with that patch. I'll try to see if I can see a better way of organizing that code. -- Ronan Dunklau