On Thu, Feb 15, 2024 at 4:29 PM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote: > > On Thursday, February 15, 2024 5:20 PM Amit Kapila <amit.kapil...@gmail.com> > wrote: > > On Thu, Feb 15, 2024 at 9:05 AM Zhijie Hou (Fujitsu) > > <houzj.f...@fujitsu.com> > > wrote: > > > > > > On Thursday, February 15, 2024 10:49 AM Amit Kapila > > <amit.kapil...@gmail.com> wrote: > > > > > > > > On Wed, Feb 14, 2024 at 7:26 PM Bertrand Drouvot > > > > > > > > Right, we can do that or probably this test would have made more > > > > sense with a worker patch where we could wait for the slot to be synced. > > > > Anyway, let's try to recreate the slot/subscription idea. BTW, do > > > > you think that adding a LOG when we are not able to sync will help > > > > in debugging such problems? I think eventually we can change it to > > > > DEBUG1 but for now, it can help with stabilizing BF and or some other > > reported issues. > > > > > > Here is the patch that attempts the re-create sub idea. > > > > > > > Pushed this. > > > > > > > I also think that a LOG/DEBUG > > > would be useful for such analysis, so the 0002 is to add such a log. > > > > > > > I feel such a LOG would be useful. > > > > + ereport(LOG, > > + errmsg("waiting for remote slot \"%s\" LSN (%X/%X) and catalog xmin" > > + " (%u) to pass local slot LSN (%X/%X) and catalog xmin (%u)", > > > > I think waiting is a bit misleading here, how about something like: > > "could not sync slot information as remote slot precedes local slot: > > remote slot \"%s\": LSN (%X/%X), catalog xmin (%u) local slot: LSN (%X/%X), > > catalog xmin (%u)" > > Changed. > > Attach the v2 patch here. > > Apart from the new log message. I think we can add one more debug message in > reserve_wal_for_local_slot, this could be useful to analyze the failure.
Yeah, that can also be helpful, but the added message looks naive to me. + elog(DEBUG1, "segno: %ld oldest_segno: %ld", oldest_segno, segno); Instead of the above, how about something like: "segno: %ld of purposed restart_lsn for the synced slot, oldest_segno: %ld available"? > And we > can also enable the DEBUG log in the 040 tap-test, I see we have similar > setting in 010_logical_decoding_timline and logging debug1 message doesn't > increase noticable time on my machine. These are done in 0002. > I haven't tested it but I think this can help in debugging BF failures, if any. I am not sure if to keep it always like that but till the time these tests are stabilized, this sounds like a good idea. So, how, about just making test changes as a separate patch so that later if required we can revert/remove it easily? Bertrand, do you have any thoughts on this? -- With Regards, Amit Kapila.