On Wed, Aug 25, 2021 at 5:02 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Wed, Aug 25, 2021 at 6:53 PM Ajin Cherian <itsa...@gmail.com> wrote: > > > > On Wed, Aug 25, 2021 at 5:43 PM Ajin Cherian <itsa...@gmail.com> wrote: > > > > > > On Wed, Aug 25, 2021 at 4:22 PM Amit Kapila <amit.kapil...@gmail.com> > > > wrote: > > > > > > > > On Wed, Aug 25, 2021 at 8:00 AM Ajin Cherian <itsa...@gmail.com> wrote: > > > > > > > > > > On Tue, Aug 24, 2021 at 11:12 PM Amit Kapila > > > > > <amit.kapil...@gmail.com> wrote: > > > > > > > > > > > But will poll function still poll or exit? Have you tried that? > > > > > > > > > > I have forced that condition with a changed query and found that the > > > > > poll will not exit in case of a NULL return. > > > > > > > > > > > > > What if the query in a poll is fired just before we get an error > > > > "tap_sub ERROR: replication slot "tap_sub" is active for PID 16336"? > > > > Won't at that stage both old and new walsender's are present, so the > > > > query might return true. You can check that via debugger by stopping > > > > just before this error occurs and then check pg_stat_replication view. > > > > > > If this error happens then the PID is NOT updated as the pid in the > > > Replication slot. I have checked this > > > and explained this in my first email itself > > > > > > > Sorry about the above email, I misunderstood. I was looking at > > pg_stat_replication_slot rather than pg_stat_replication hence the > > confusion. > > Amit is correct, just prior to the walsender erroring out, it briefly > > appears in the > > pg_stat_replication, and that is why this error happens. Sorry for the > > confusion. > > I just confirmed it, got both the walsenders stopped in the debugger: > > > > postgres=# select pid from pg_stat_replication where application_name = > > 'sub'; > > pid > > ------ > > 7899 > > 7993 > > (2 rows) > > IIUC the query[1] used for polling returns two rows in this case: {t, > f} or {f, t}. But did poll_query_until() returned OK in this case even > if we expected one row of 't'? My guess of how this issue happened is: >
Yeah, we can check this but I guess as soon as it gets 't', the poll query will exit. > 1. the first polling query after "ATLER SUBSCRIPTION CONNECTION" > passed (for some reason). > I think the reason for exit is that we get two rows with the same application_name in pg_stat_replication. > 2. all wal senders exited. > 3. get the pid of wal sender with application_name 'tap_sub' but got nothing. > 4. the second polling query resulted in a syntax error since $oldpid is null. > Your understanding of steps is the same as mine. -- With Regards, Amit Kapila.