On Thu, Aug 26, 2021 at 7:38 AM Ajin Cherian <itsa...@gmail.com> wrote: > > On Thu, Aug 26, 2021 at 11:02 AM Masahiko Sawada <sawada.m...@gmail.com> > wrote: > > > > Luckily these logs have the disconnection times of the tap test client > sessions as well. (not sure why I don't see these when I run these > tests). > > Step 5 could have happened anywhere between 18:44:38.016 and 18:44:38.063 > 18:44:38.016 CEST [16474:3] 001_rep_changes.pl LOG: statement: SELECT > pid != 16336 FROM pg_stat_replication WHERE application_name = > 'tap_sub'; > : > : > 18:44:38.063 CEST [16474:4] 001_rep_changes.pl LOG: disconnection: > session time: 0:00:00.063 user=nm database=postgres host=[local] > > When the query starts both walsenders are present but when the query > completes both walsenders are gone, the actual query evaluation could > have happened any time in between. This is the rare timing window that > causes this problem. >
You have a point but if we see the below logs, it seems the second walsender (#step6) seemed to exited before the first walsender (#step4). 2021-08-15 18:44:38.041 CEST [16475:10] tap_sub LOG: disconnection: session time: 0:00:00.036 user=nm database=postgres host=[local] 2021-08-15 18:44:38.043 CEST [16336:14] tap_sub LOG: disconnection: session time: 0:00:06.367 user=nm database=postgres host=[local] Isn't it possible that pid is cleared in the other order due to which we are seeing this problem? -- With Regards, Amit Kapila.