Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-03-06 Thread Michael Paquier
On Wed, Mar 06, 2024 at 05:45:56PM +, Bertrand Drouvot wrote: > Thank you both for the report! I did a few test manually and can see the issue > from times to times. When the issue occurs, the logical decoding was able to > go through the place where LogicalConfirmReceivedLocation() updates the

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-03-06 Thread Bertrand Drouvot
Hi, On Wed, Mar 06, 2024 at 05:45:56PM +0530, Bharath Rupireddy wrote: > On Wed, Mar 6, 2024 at 4:51 PM Michael Paquier wrote: > > > > On Wed, Mar 06, 2024 at 09:17:58AM +, Bertrand Drouvot wrote: > > > Right, somehow out of context here. > > > > We're not yet in the green yet, one of my anim

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-03-06 Thread Bharath Rupireddy
On Wed, Mar 6, 2024 at 4:51 PM Michael Paquier wrote: > > On Wed, Mar 06, 2024 at 09:17:58AM +, Bertrand Drouvot wrote: > > Right, somehow out of context here. > > We're not yet in the green yet, one of my animals has complained: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi&

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-03-06 Thread Michael Paquier
On Wed, Mar 06, 2024 at 09:17:58AM +, Bertrand Drouvot wrote: > Right, somehow out of context here. We're not yet in the green yet, one of my animals has complained: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi&dt=2024-03-06%2010%3A10%3A03 This is telling us that the xmin hor

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-03-06 Thread Bertrand Drouvot
Hi, On Wed, Mar 06, 2024 at 02:47:55PM +0900, Michael Paquier wrote: > On Tue, Mar 05, 2024 at 10:17:03AM +, Bertrand Drouvot wrote: > > On Tue, Mar 05, 2024 at 09:42:20AM +0900, Michael Paquier wrote: > > The issue is that then the new ASSERT is > > triggered leading to the standby shutdown.

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-03-05 Thread Michael Paquier
On Tue, Mar 05, 2024 at 10:17:03AM +, Bertrand Drouvot wrote: > On Tue, Mar 05, 2024 at 09:42:20AM +0900, Michael Paquier wrote: >> First, in a build where 818fefd8fd is included, this makes the test >> script a lot slower. Most of the logic is quick, but we're spending >> 10s or so checking t

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-03-05 Thread Bertrand Drouvot
Hi, On Tue, Mar 05, 2024 at 09:42:20AM +0900, Michael Paquier wrote: > On Mon, Feb 26, 2024 at 02:01:45PM +, Bertrand Drouvot wrote: > > Though [1] mentioned up-thread is not pushed yet, I'm Sharing the POC patch > > now > > (see the attached). > > I have looked at what you have here. Thank

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-03-04 Thread Michael Paquier
On Mon, Feb 26, 2024 at 02:01:45PM +, Bertrand Drouvot wrote: > Though [1] mentioned up-thread is not pushed yet, I'm Sharing the POC patch > now > (see the attached). I have looked at what you have here. First, in a build where 818fefd8fd is included, this makes the test script a lot slower

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-02-26 Thread Bertrand Drouvot
Hi, On Tue, Feb 20, 2024 at 04:03:53PM +, Bertrand Drouvot wrote: > Hi, > > On Tue, Feb 20, 2024 at 02:33:44PM +0900, Michael Paquier wrote: > > On Tue, Feb 20, 2024 at 08:51:17AM +0900, Michael Paquier wrote: > > > Prefixing these with "initial_" is fine, IMO. That shows the > > > intention

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-02-20 Thread Bertrand Drouvot
Hi, On Tue, Feb 20, 2024 at 02:33:44PM +0900, Michael Paquier wrote: > On Tue, Feb 20, 2024 at 08:51:17AM +0900, Michael Paquier wrote: > > Prefixing these with "initial_" is fine, IMO. That shows the > > intention that these come from the slot's data before doing the > > termination. So I'm OK

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-02-19 Thread Michael Paquier
On Tue, Feb 20, 2024 at 08:51:17AM +0900, Michael Paquier wrote: > Prefixing these with "initial_" is fine, IMO. That shows the > intention that these come from the slot's data before doing the > termination. So I'm OK with what's been proposed in v3. I was looking at that a second time, and jus

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-02-19 Thread Michael Paquier
On Mon, Feb 19, 2024 at 09:49:24AM +, Bertrand Drouvot wrote: > On Mon, Feb 19, 2024 at 01:45:16PM +0530, Bharath Rupireddy wrote: >> Prefix 'initial_' makes the variable names a bit longer, I think we >> can just use effective_xmin, catalog_effective_xmin and restart_lsn, >> the code updating

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-02-19 Thread Bertrand Drouvot
Hi, On Mon, Feb 19, 2024 at 01:45:16PM +0530, Bharath Rupireddy wrote: > On Mon, Feb 19, 2024 at 11:44 AM Michael Paquier wrote: > > > > > Yeah, comments added in v3. > > > > The contents look rather OK, I may do some word-smithing for both. > > Here are some comments on v3: Thanks for looing a

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-02-19 Thread Bharath Rupireddy
On Mon, Feb 19, 2024 at 11:44 AM Michael Paquier wrote: > > > Yeah, comments added in v3. > > The contents look rather OK, I may do some word-smithing for both. Here are some comments on v3: 1. +XLogRecPtrinitial_effective_xmin = InvalidXLogRecPtr; +XLogRecPtrinitial_catalog_effe

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-02-18 Thread Bertrand Drouvot
Hi, On Mon, Feb 19, 2024 at 03:14:07PM +0900, Michael Paquier wrote: > On Thu, Feb 15, 2024 at 11:24:59AM +, Bertrand Drouvot wrote: > > Thanks for looking at it! > > > > Agree, using an assertion instead in v3 attached. > > And you did not forget the PG_USED_FOR_ASSERTS_ONLY. Thanks to the

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-02-18 Thread Michael Paquier
On Thu, Feb 15, 2024 at 11:24:59AM +, Bertrand Drouvot wrote: > Thanks for looking at it! > > Agree, using an assertion instead in v3 attached. And you did not forget the PG_USED_FOR_ASSERTS_ONLY. > > It seems important to document why we are saving this data here; while > > we hold the LWLo

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-02-15 Thread Bertrand Drouvot
Hi, On Thu, Feb 15, 2024 at 02:09:45PM +0900, Michael Paquier wrote: > On Thu, Jan 18, 2024 at 02:20:28PM +, Bertrand Drouvot wrote: > > On Thu, Jan 18, 2024 at 04:59:39PM +0530, Bharath Rupireddy wrote: > >> I'm not sure if it > >> can happen at all, but I think we can rely on previous confli

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-02-14 Thread Michael Paquier
On Thu, Jan 18, 2024 at 02:20:28PM +, Bertrand Drouvot wrote: > On Thu, Jan 18, 2024 at 04:59:39PM +0530, Bharath Rupireddy wrote: >> I'm not sure if it >> can happen at all, but I think we can rely on previous conflict reason >> instead of previous effective_xmin/effective_catalog_xmin/restart

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-01-18 Thread Bertrand Drouvot
Hi, On Thu, Jan 18, 2024 at 04:59:39PM +0530, Bharath Rupireddy wrote: > IIUC, the issue is that we terminate the process holding the > replication slot, and the conflict cause that we recorded before > terminating the process changes in the next iteration due to the > advancement in effective_xmi

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-01-18 Thread Bharath Rupireddy
On Mon, Jan 15, 2024 at 1:18 PM Bertrand Drouvot wrote: > > Hi hackers, > > While working on [1], we discovered (thanks Alexander for the testing) that an > conflicting active logical slot on a standby could be "terminated" without > leading to an "obsolete" message (see [2]). > > Indeed, in case