On Wed, Aug 25, 2021 at 11:04 PM Ajin Cherian <itsa...@gmail.com> wrote: > > On Wed, Aug 25, 2021 at 11:17 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Wed, Aug 25, 2021 at 6:10 PM Masahiko Sawada <sawada.m...@gmail.com> > > wrote: > > > > > > I did a quick check with the following tap test code: > > > > > > $node_publisher->poll_query_until('postgres', > > > qq( > > > select 1 != foo.column1 from (values(0), (1)) as foo; > > > )); > > > > > > The query returns {t, f} but poll_query_until() never finished. The > > > same is true when the query returns {f, t}. > > > > > Yes, this is true, I also see the same behaviour. > > > > > This means something different is going on in Ajin's setup. Ajin, can > > you please share how did you confirm your findings about poll_query? > > Relooking at my logs, I think what happens is this: > > 1. First walsender 'a' is running. > 2. Second walsender 'b' starts and attempts at acquiring the slot > finds that the slot is active for pid a. > 3. Now both walsenders are active, the query does not return. > 4. First walsender 'a' times out and exits. > 5. Now only the second walsender is active and the query returns OK > because pid != a. > 6. Second walsender exits with error. > 7. Another query attempts to get the pid of the running walsender for > tap_sub but returns null because both walsender exits. > 8. This null return value results in the next query erroring out and > the test failing.
So this is slightly different than what we can see in the topminnow logs? According to the server logs, step #5 happened (at 18:44:38.016) before step #4 happened (at 18:44:38.043). > > >Can you additionally check the value of 'state' from > >pg_stat_replication for both the old and new walsender sessions? > > Yes, will try this and post a patch tomorrow. Thanks. I guess the state of the new walsender should be "startup" whereas the old one should be "streaming". Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/