Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-19 Thread Fujii Masao
On 2020/04/18 0:31, Tom Lane wrote: Kyotaro Horiguchi writes: At Fri, 17 Apr 2020 16:03:34 +0900, Fujii Masao wrote in I agree that it might be worth considering the removal of am_sync for the master branch or v14. But I think that it should not be back-patched. Ah! Agreed. Yeah, tha

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-19 Thread Masahiko Sawada
On Sun, 19 Apr 2020 at 01:00, Tom Lane wrote: > > Masahiko Sawada writes: > > On Sat, 18 Apr 2020 at 00:31, Tom Lane wrote: > >> + /* Quick out if not even configured to be synchronous */ > >> + if (SyncRepConfig == NULL) > >> + return false; > > > I felt strange a bit that we do the a

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-18 Thread Tom Lane
Masahiko Sawada writes: > On Sat, 18 Apr 2020 at 00:31, Tom Lane wrote: >> + /* Quick out if not even configured to be synchronous */ >> + if (SyncRepConfig == NULL) >> + return false; > I felt strange a bit that we do the above check in > SyncRepGetSyncRecPtr() because SyncRepReleaseW

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-17 Thread Masahiko Sawada
On Sat, 18 Apr 2020 at 00:31, Tom Lane wrote: > > Kyotaro Horiguchi writes: > > At Fri, 17 Apr 2020 16:03:34 +0900, Fujii Masao > > wrote in > >> I agree that it might be worth considering the removal of am_sync for > >> the master branch or v14. But I think that it should not be > >> back-patc

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-17 Thread Tom Lane
Kyotaro Horiguchi writes: > At Fri, 17 Apr 2020 17:03:11 +0900, Masahiko Sawada > wrote in >> On Fri, 17 Apr 2020 at 14:58, Kyotaro Horiguchi >> wrote: >> Just for confirmation, since the new approach doesn't change that >> walsenders reload new config at their convenient timing, it still can

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-17 Thread Tom Lane
Kyotaro Horiguchi writes: > At Fri, 17 Apr 2020 16:03:34 +0900, Fujii Masao > wrote in >> I agree that it might be worth considering the removal of am_sync for >> the master branch or v14. But I think that it should not be >> back-patched. > Ah! Agreed. Yeah, that's not necessary to fix the b

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-17 Thread Kyotaro Horiguchi
At Fri, 17 Apr 2020 17:03:11 +0900, Masahiko Sawada wrote in > On Fri, 17 Apr 2020 at 14:58, Kyotaro Horiguchi > wrote: > > The attached is baed on syncrep-fixes-1.patch + am_sync elimination. > > > > Just for confirmation, since the new approach doesn't change that > walsenders reload new co

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-17 Thread Kyotaro Horiguchi
At Fri, 17 Apr 2020 16:03:34 +0900, Fujii Masao wrote in > I agree that it might be worth considering the removal of am_sync for > the master branch or v14. But I think that it should not be > back-patched. Ah! Agreed. regards. -- Kyotaro Horiguchi NTT Open Source Software Center

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-17 Thread Masahiko Sawada
On Fri, 17 Apr 2020 at 14:58, Kyotaro Horiguchi wrote: > > At Thu, 16 Apr 2020 11:39:06 -0400, Tom Lane wrote in > > Kyotaro Horiguchi writes: > > > [ syncrep-fixes-4.patch ] > > > > I agree that we could probably improve the clarity of this code with > > further rewriting, but I'm still very op

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-17 Thread Fujii Masao
On 2020/04/17 3:00, Tom Lane wrote: Fujii Masao writes: On 2020/04/14 22:52, Tom Lane wrote: *Yes it does*. The existing code can deliver entirely broken results if some walsender exits between where we examine the priorities and where we fetch the WAL pointers. IMO that the broken resul

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-17 Thread Fujii Masao
On 2020/04/17 14:58, Kyotaro Horiguchi wrote: At Thu, 16 Apr 2020 11:39:06 -0400, Tom Lane wrote in Kyotaro Horiguchi writes: [ syncrep-fixes-4.patch ] I agree that we could probably improve the clarity of this code with further rewriting, but I'm still very opposed to the idea of having

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-16 Thread Kyotaro Horiguchi
At Thu, 16 Apr 2020 11:39:06 -0400, Tom Lane wrote in > Kyotaro Horiguchi writes: > > [ syncrep-fixes-4.patch ] > > I agree that we could probably improve the clarity of this code with > further rewriting, but I'm still very opposed to the idea of having > callers know that the first num_sync a

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-16 Thread Tom Lane
Fujii Masao writes: > On 2020/04/14 22:52, Tom Lane wrote: >> *Yes it does*. The existing code can deliver entirely broken results >> if some walsender exits between where we examine the priorities and >> where we fetch the WAL pointers. > IMO that the broken results can be delivered when walsen

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-16 Thread Fujii Masao
On 2020/04/14 22:52, Tom Lane wrote: Kyotaro Horiguchi writes: SyncRepGetSyncStandbysPriority() is runing holding SyncRepLock so sync_standby_priority of any walsender can be changed while the function is scanning welsenders. The issue is we already have inconsistent walsender information be

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-16 Thread Tom Lane
Kyotaro Horiguchi writes: > [ syncrep-fixes-4.patch ] I agree that we could probably improve the clarity of this code with further rewriting, but I'm still very opposed to the idea of having callers know that the first num_sync array elements are the active ones. It's wrong (or at least differen

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-16 Thread Kyotaro Horiguchi
At Thu, 16 Apr 2020 16:48:28 +0900, Masahiko Sawada wrote in > This is just a notice; I'm reading your latest patch but it seems to > include unrelated changes: > > $ git diff --stat > src/backend/replication/syncrep.c | 475 > +

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-16 Thread Masahiko Sawada
On Thu, 16 Apr 2020 at 16:22, Kyotaro Horiguchi wrote: > > At Wed, 15 Apr 2020 11:31:49 -0400, Tom Lane wrote in > > Kyotaro Horiguchi writes: > > > At Tue, 14 Apr 2020 16:32:40 -0400, Tom Lane wrote in > > > + stby->is_sync_standby = true; /* might change below */ > > > > > I'm une

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-16 Thread Kyotaro Horiguchi
At Wed, 15 Apr 2020 11:31:49 -0400, Tom Lane wrote in > Kyotaro Horiguchi writes: > > At Tue, 14 Apr 2020 16:32:40 -0400, Tom Lane wrote in > > + stby->is_sync_standby = true; /* might change below */ > > > I'm uneasy with that. In quorum mode all running standbys are marked > >

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-15 Thread Tom Lane
Kyotaro Horiguchi writes: > At Tue, 14 Apr 2020 16:32:40 -0400, Tom Lane wrote in > + stby->is_sync_standby = true; /* might change below */ > I'm uneasy with that. In quorum mode all running standbys are marked > as "sync" and that's bogus. I don't follow that? The existing co

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-15 Thread Kyotaro Horiguchi
At Wed, 15 Apr 2020 13:01:02 +0900, Masahiko Sawada wrote in > On Tue, 14 Apr 2020 at 18:35, Kyotaro Horiguchi > wrote: > > > > At Tue, 14 Apr 2020 13:06:14 +0900, Masahiko Sawada > > wrote in > > > On Tue, 14 Apr 2020 at 10:34, Tom Lane wrote: > > > > > > > > Kyotaro Horiguchi writes: > >

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-15 Thread Kyotaro Horiguchi
At Wed, 15 Apr 2020 11:35:58 +0900 (JST), Kyotaro Horiguchi wrote in > I'm looking this more closer. It looks to be in the right direction to me. As mentioned in the previous mail, I removed is_sync_standby from SycnStandbyData. But just doing that breaks pg_stat_get_wal_senders. It is an exst

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-14 Thread Masahiko Sawada
On Tue, 14 Apr 2020 at 18:35, Kyotaro Horiguchi wrote: > > At Tue, 14 Apr 2020 13:06:14 +0900, Masahiko Sawada > wrote in > > On Tue, 14 Apr 2020 at 10:34, Tom Lane wrote: > > > > > > Kyotaro Horiguchi writes: > > > > At Sat, 11 Apr 2020 18:30:30 -0400, Tom Lane wrote > > > > in > > > >> Wha

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-14 Thread Noah Misch
On Tue, Apr 14, 2020 at 04:32:40PM -0400, Tom Lane wrote: > I wrote: > > It doesn't seem to me to be that hard to implement the desired > > semantics for synchronous_standby_names with inconsistent info. > > In FIRST mode you basically just need to take the N smallest > > priorities you see in the

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-14 Thread Kyotaro Horiguchi
At Tue, 14 Apr 2020 16:32:40 -0400, Tom Lane wrote in > I wrote: > > It doesn't seem to me to be that hard to implement the desired > > semantics for synchronous_standby_names with inconsistent info. > > In FIRST mode you basically just need to take the N smallest > > priorities you see in the ar

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-14 Thread Kyotaro Horiguchi
At Tue, 14 Apr 2020 09:52:42 -0400, Tom Lane wrote in > Kyotaro Horiguchi writes: > > SyncRepGetSyncStandbysPriority() is runing holding SyncRepLock so > > sync_standby_priority of any walsender can be changed while the > > function is scanning welsenders. The issue is we already have > > incons

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-14 Thread Tom Lane
I wrote: > It doesn't seem to me to be that hard to implement the desired > semantics for synchronous_standby_names with inconsistent info. > In FIRST mode you basically just need to take the N smallest > priorities you see in the array, but without assuming there are no > duplicates or holes. It

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-14 Thread Tom Lane
Kyotaro Horiguchi writes: > SyncRepGetSyncStandbysPriority() is runing holding SyncRepLock so > sync_standby_priority of any walsender can be changed while the > function is scanning welsenders. The issue is we already have > inconsistent walsender information before we enter the function. Thus >

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-14 Thread Kyotaro Horiguchi
At Tue, 14 Apr 2020 13:06:14 +0900, Masahiko Sawada wrote in > On Tue, 14 Apr 2020 at 10:34, Tom Lane wrote: > > > > Kyotaro Horiguchi writes: > > > At Sat, 11 Apr 2020 18:30:30 -0400, Tom Lane wrote in > > >> What I think we should do about this is, essentially, to get rid of > > >> SyncRepG

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-13 Thread Masahiko Sawada
On Tue, 14 Apr 2020 at 10:34, Tom Lane wrote: > > Kyotaro Horiguchi writes: > > At Sat, 11 Apr 2020 18:30:30 -0400, Tom Lane wrote in > >> What I think we should do about this is, essentially, to get rid of > >> SyncRepGetSyncStandbys. Instead, let's have each walsender advertise > >> whether *

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-13 Thread Tom Lane
Kyotaro Horiguchi writes: > At Sat, 11 Apr 2020 18:30:30 -0400, Tom Lane wrote in >> What I think we should do about this is, essentially, to get rid of >> SyncRepGetSyncStandbys. Instead, let's have each walsender advertise >> whether *it* believes that it is a sync standby, based on its last

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-12 Thread Kyotaro Horiguchi
At Mon, 13 Apr 2020 15:31:01 +0900 (JST), Kyotaro Horiguchi wrote in > At Sat, 11 Apr 2020 18:30:30 -0400, Tom Lane wrote in > > The point that I was trying to make originally is that it seems quite > > insane to imagine that a walsender's sync_standby_priority value is > > somehow more stable

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-12 Thread Kyotaro Horiguchi
At Sat, 11 Apr 2020 18:30:30 -0400, Tom Lane wrote in > Masahiko Sawada writes: > > On Tue, 31 Mar 2020 at 23:16, Masahiko Sawada > > wrote: > >>> Therefore, the band-aid fix seems to be to set the lowest priority to > >>> very large number at the beginning of SyncRepGetSyncStandbysPriority().

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-04-11 Thread Tom Lane
Masahiko Sawada writes: > On Tue, 31 Mar 2020 at 23:16, Masahiko Sawada > wrote: >>> Therefore, the band-aid fix seems to be to set the lowest priority to >>> very large number at the beginning of SyncRepGetSyncStandbysPriority(). >> I think we can use max_wal_senders. > Sorry, that's not true.

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-03-31 Thread Masahiko Sawada
On Tue, 31 Mar 2020 at 23:16, Masahiko Sawada wrote: > > On Fri, 27 Mar 2020 at 13:54, Fujii Masao wrote: > > > > > > > > On 2020/03/27 10:26, Tom Lane wrote: > > > Twice in the past month [1][2], buildfarm member hoverfly has managed > > > to reach the "unreachable" Assert(false) at the end of >

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-03-31 Thread Masahiko Sawada
On Fri, 27 Mar 2020 at 13:54, Fujii Masao wrote: > > > > On 2020/03/27 10:26, Tom Lane wrote: > > Twice in the past month [1][2], buildfarm member hoverfly has managed > > to reach the "unreachable" Assert(false) at the end of > > SyncRepGetSyncStandbysPriority. > > When I search the past discussi

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-03-30 Thread Kyotaro Horiguchi
At Fri, 27 Mar 2020 13:54:25 +0900, Fujii Masao wrote in > > > On 2020/03/27 10:26, Tom Lane wrote: > > Twice in the past month [1][2], buildfarm member hoverfly has managed > > to reach the "unreachable" Assert(false) at the end of > > SyncRepGetSyncStandbysPriority. > > When I search the pa

Re: Race condition in SyncRepGetSyncStandbysPriority

2020-03-26 Thread Fujii Masao
On 2020/03/27 10:26, Tom Lane wrote: Twice in the past month [1][2], buildfarm member hoverfly has managed to reach the "unreachable" Assert(false) at the end of SyncRepGetSyncStandbysPriority. When I search the past discussions, I found that Noah Misch reported the same issue. https://www.p

Race condition in SyncRepGetSyncStandbysPriority

2020-03-26 Thread Tom Lane
Twice in the past month [1][2], buildfarm member hoverfly has managed to reach the "unreachable" Assert(false) at the end of SyncRepGetSyncStandbysPriority. What seems likely to me, after quickly eyeballing the code, is that hoverfly is hitting the blatantly-obvious race condition in that function