Re: 001_rep_changes.pl stalls

2020-04-20 Thread Michael Paquier
On Mon, Apr 20, 2020 at 07:24:28PM +0900, Fujii Masao wrote: > I was misreading this as something like "any other blocking than > the blocking in WalSndLoop()". Ok, I have no more comments on > the patch. Patch looks rather sane to me at quick glance. I can see that WAL senders are now not stuck

Re: 001_rep_changes.pl stalls

2020-04-20 Thread Fujii Masao
On 2020/04/20 16:02, Noah Misch wrote: On Mon, Apr 20, 2020 at 02:30:08PM +0900, Fujii Masao wrote: +* Block if we have unsent data. XXX For logical replication, let +* WalSndWaitForWal(), handle any other blocking; idle receivers need +* its

Re: 001_rep_changes.pl stalls

2020-04-20 Thread Kyotaro Horiguchi
At Mon, 20 Apr 2020 00:59:54 -0700, Noah Misch wrote in > On Mon, Apr 20, 2020 at 04:15:40PM +0900, Kyotaro Horiguchi wrote: > > At Sat, 18 Apr 2020 00:01:42 -0700, Noah Misch wrote in > > > On Fri, Apr 17, 2020 at 05:06:29PM +0900, Kyotaro Horiguchi wrote: > > > > At Fri, 17 Apr 2020 17:00:15

Re: 001_rep_changes.pl stalls

2020-04-20 Thread Noah Misch
On Mon, Apr 20, 2020 at 04:15:40PM +0900, Kyotaro Horiguchi wrote: > At Sat, 18 Apr 2020 00:01:42 -0700, Noah Misch wrote in > > On Fri, Apr 17, 2020 at 05:06:29PM +0900, Kyotaro Horiguchi wrote: > > > At Fri, 17 Apr 2020 17:00:15 +0900 (JST), Kyotaro Horiguchi > > > wrote in > > > > By the wa

Re: 001_rep_changes.pl stalls

2020-04-20 Thread Kyotaro Horiguchi
At Mon, 20 Apr 2020 14:30:08 +0900, Fujii Masao wrote in > > > On 2020/04/18 16:01, Noah Misch wrote: > > On Sat, Apr 18, 2020 at 12:29:58AM +0900, Fujii Masao wrote: > >>> 4. Keep the WalSndLoop() wait, but condition it on !logical. This is > >>> the > >>> minimal fix, but it crudely pun

Re: 001_rep_changes.pl stalls

2020-04-20 Thread Kyotaro Horiguchi
At Sat, 18 Apr 2020 00:01:42 -0700, Noah Misch wrote in > On Fri, Apr 17, 2020 at 05:06:29PM +0900, Kyotaro Horiguchi wrote: > > At Fri, 17 Apr 2020 17:00:15 +0900 (JST), Kyotaro Horiguchi > > wrote in > > > By the way, if latch is consumed in WalSndLoop, succeeding call to > > > WalSndWaitFor

Re: 001_rep_changes.pl stalls

2020-04-20 Thread Noah Misch
On Mon, Apr 20, 2020 at 02:30:08PM +0900, Fujii Masao wrote: > + * Block if we have unsent data. XXX For logical replication, > let > + * WalSndWaitForWal(), handle any other blocking; idle > receivers need > + * its additional actions. For physical replic

Re: 001_rep_changes.pl stalls

2020-04-19 Thread Fujii Masao
On 2020/04/18 16:01, Noah Misch wrote: On Fri, Apr 17, 2020 at 05:06:29PM +0900, Kyotaro Horiguchi wrote: At Fri, 17 Apr 2020 17:00:15 +0900 (JST), Kyotaro Horiguchi wrote in By the way, if latch is consumed in WalSndLoop, succeeding call to WalSndWaitForWal cannot be woke-up by the latch-

Re: 001_rep_changes.pl stalls

2020-04-18 Thread Noah Misch
On Fri, Apr 17, 2020 at 05:06:29PM +0900, Kyotaro Horiguchi wrote: > At Fri, 17 Apr 2020 17:00:15 +0900 (JST), Kyotaro Horiguchi > wrote in > > By the way, if latch is consumed in WalSndLoop, succeeding call to > > WalSndWaitForWal cannot be woke-up by the latch-set. Doesn't that > > cause miss

Re: 001_rep_changes.pl stalls

2020-04-17 Thread Fujii Masao
On 2020/04/17 14:41, Noah Misch wrote: On Mon, Apr 13, 2020 at 09:45:16PM -0400, Tom Lane wrote: Noah Misch writes: This seems to have made the following race condition easier to hit: https://www.postgresql.org/message-id/flat/20200206074552.GB3326097%40rfd.leadboat.com https://www.postgres

Re: 001_rep_changes.pl stalls

2020-04-17 Thread Tom Lane
Kyotaro Horiguchi writes: > At Thu, 16 Apr 2020 22:41:46 -0700, Noah Misch wrote in >> I'm favoring (1). Other preferences? > Starting from the current shape, I think 1 is preferable, since that > waiting logic is no longer shared between logical and physical > replications. But I'm not sure

Re: 001_rep_changes.pl stalls

2020-04-17 Thread Kyotaro Horiguchi
Sorry , I wrote something wrong. At Fri, 17 Apr 2020 17:00:15 +0900 (JST), Kyotaro Horiguchi wrote in > At Thu, 16 Apr 2020 22:41:46 -0700, Noah Misch wrote in > > On Mon, Apr 13, 2020 at 09:45:16PM -0400, Tom Lane wrote: > > > Noah Misch writes: > > > > This seems to have made the following

Re: 001_rep_changes.pl stalls

2020-04-17 Thread Kyotaro Horiguchi
At Thu, 16 Apr 2020 22:41:46 -0700, Noah Misch wrote in > On Mon, Apr 13, 2020 at 09:45:16PM -0400, Tom Lane wrote: > > Noah Misch writes: > > > This seems to have made the following race condition easier to hit: > > > https://www.postgresql.org/message-id/flat/20200206074552.GB3326097%40rfd.lea

Re: 001_rep_changes.pl stalls

2020-04-16 Thread Noah Misch
On Mon, Apr 13, 2020 at 09:45:16PM -0400, Tom Lane wrote: > Noah Misch writes: > > This seems to have made the following race condition easier to hit: > > https://www.postgresql.org/message-id/flat/20200206074552.GB3326097%40rfd.leadboat.com > > https://www.postgresql.org/message-id/flat/21519.158

Re: 001_rep_changes.pl stalls

2020-04-13 Thread Tom Lane
Noah Misch writes: > This seems to have made the following race condition easier to hit: > https://www.postgresql.org/message-id/flat/20200206074552.GB3326097%40rfd.leadboat.com > https://www.postgresql.org/message-id/flat/21519.1585272409%40sss.pgh.pa.us Yeah, I just came to the same guess in th

Re: 001_rep_changes.pl stalls

2020-04-13 Thread Noah Misch
On Sun, Apr 05, 2020 at 11:36:49PM -0700, Noah Misch wrote: > Executive summary: the "MyWalSnd->write < sentPtr" in WalSndWaitForWal() is > important for promptly updating pg_stat_replication. When caught up, we > should impose that logic before every sleep. The one-line fix is to sleep in > WalS

001_rep_changes.pl stalls

2020-04-05 Thread Noah Misch
caught up. On my regular development machine, src/test/subscription/t/001_rep_changes.pl stalls for ~10s at this wait_for_catchup: $node_publisher->safe_psql('postgres', "DELETE FROM tab_rep"); # Restart the publisher and check the state of the subscriber which