Re: Race condition in InvalidateObsoleteReplicationSlots()

2022-02-22 Thread Andres Freund
Hi, On 2022-02-22 17:48:55 -0800, Andres Freund wrote: > I think the minor changes might unfortunately have introduced a race? Before > the patch just used ConditionVariableSleep(), but now it also has a > ConditionVariablePrepareToSleep(). Without re-checking the sleep condition > until >

Re: Race condition in InvalidateObsoleteReplicationSlots()

2022-02-22 Thread Andres Freund
Hi, On 2021-06-11 12:27:57 -0400, Álvaro Herrera wrote: > On 2021-Jun-10, Álvaro Herrera wrote: > > > Here's a version that I feel is committable (0001). There was an issue > > when returning from the inner loop, if in a previous iteration we had > > released the lock. In that case we need to r

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-07-05 Thread vignesh C
On Mon, Jul 5, 2021 at 10:30 PM Álvaro Herrera wrote: > > On 2021-Jul-05, vignesh C wrote: > > > On Wed, Jun 23, 2021 at 7:32 PM Álvaro Herrera > > wrote: > > > > > > On 2021-Jun-20, Tom Lane wrote: > > > > > > > Actually ... isn't there a second race, in the opposite direction? > > > > IIUC, th

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-07-05 Thread Álvaro Herrera
On 2021-Jul-05, vignesh C wrote: > On Wed, Jun 23, 2021 at 7:32 PM Álvaro Herrera > wrote: > > > > On 2021-Jun-20, Tom Lane wrote: > > > > > Actually ... isn't there a second race, in the opposite direction? > > > IIUC, the point of this is that once we force some WAL to be sent > > > to the fro

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-07-05 Thread vignesh C
On Wed, Jun 23, 2021 at 7:32 PM Álvaro Herrera wrote: > > On 2021-Jun-20, Tom Lane wrote: > > > Actually ... isn't there a second race, in the opposite direction? > > IIUC, the point of this is that once we force some WAL to be sent > > to the frozen sender/receiver, they'll be killed for failure

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-06-23 Thread Álvaro Herrera
On 2021-Jun-20, Tom Lane wrote: > Actually ... isn't there a second race, in the opposite direction? > IIUC, the point of this is that once we force some WAL to be sent > to the frozen sender/receiver, they'll be killed for failure to > respond. But the advance_wal call is not the only possible c

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-06-20 Thread Tom Lane
I wrote: > I studied this failure a bit more, and I think the test itself has > a race condition. It's doing > > # freeze walsender and walreceiver. Slot will still be active, but walreceiver > # won't get anything anymore. > kill 'STOP', $senderpid, $receiverpid; > $logstart = get_log_size($node_

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-06-20 Thread Tom Lane
I wrote: > Hmm ... desmoxytes has failed this test once, out of four runs since > it went in: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2021-06-19%2003%3A06%3A04 I studied this failure a bit more, and I think the test itself has a race condition. It's doing # freeze

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-06-20 Thread Álvaro Herrera
Hah, desmoxytes failed once: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2021-06-19%2003%3A06%3A04 I'll revert it and investigate later. There have been no other failures. -- Álvaro Herrera39°49'30"S 73°17'W "Hay que recordar que la existenci

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-06-19 Thread Tom Lane
=?utf-8?Q?=C3=81lvaro?= Herrera writes: > It occurred to me that this could be made better by sigstopping both > walreceiver and walsender, then letting only the latter run; AFAICS this > makes the test stable. I'll register this on the upcoming commitfest to > let cfbot run it, and if it looks g

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-06-18 Thread Álvaro Herrera
Apologies, I inadvertently sent the version before I added a maximum number of iterations in the final loop. -- Álvaro Herrera Valdivia, Chile "La fuerza no está en los medios físicos sino que reside en una voluntad indomable" (Gandhi) >From 1492e9468ecd86167a1253c4a2e9b31139835649 Mon Sep

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-06-18 Thread Álvaro Herrera
On 2021-Jun-11, Álvaro Herrera wrote: > I tried hard to make this stable, but it just isn't (it works fine one > thousand runs, then I grab some coffee and run it once more and that one > fails. Why? that's not clear to me). Attached is the last one I have, > in case somebody wants to make it b

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-06-11 Thread Andres Freund
On 2021-06-11 15:52:21 -0400, Álvaro Herrera wrote: > On 2021-Apr-07, Andres Freund wrote: > > > After this I don't see a reason to have SAB_Inquire - as far as I can > > tell it's practically impossible to use without race conditions? Except > > for raising an error - which is "builtin"... > > P

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-06-11 Thread Álvaro Herrera
On 2021-Apr-07, Andres Freund wrote: > After this I don't see a reason to have SAB_Inquire - as far as I can > tell it's practically impossible to use without race conditions? Except > for raising an error - which is "builtin"... Pushed 0002. Thanks! -- Álvaro Herrera

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-06-11 Thread Álvaro Herrera
On 2021-Jun-10, Álvaro Herrera wrote: > Here's a version that I feel is committable (0001). There was an issue > when returning from the inner loop, if in a previous iteration we had > released the lock. In that case we need to return with the lock not > held, so that the caller can acquire it a

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-06-10 Thread Álvaro Herrera
On 2021-Jun-10, Álvaro Herrera wrote: > I wrote a test (0002) to cover the case of signalling a walsender, which > is currently not covered (we only deal with the case of a standby that's > not running). There are some sharp edges in this code -- I had to make > it use background_psql() to send a

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-06-10 Thread Álvaro Herrera
Here's a version that I feel is committable (0001). There was an issue when returning from the inner loop, if in a previous iteration we had released the lock. In that case we need to return with the lock not held, so that the caller can acquire it again, but weren't. This seems pretty hard to h

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-04-30 Thread Andres Freund
Hi, On 2021-04-29 13:28:20 -0400, Álvaro Herrera wrote: > On 2021-Apr-07, Andres Freund wrote: > > > I'm also confused by the use of ConditionVariableTimedSleep(timeout = > > 10). Why do we need a timed sleep here in the first place? And why with > > such a short sleep? > > I was scared of the p

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-04-30 Thread Andres Freund
Hi, On 2021-04-08 17:03:41 +0530, Amit Kapila wrote: > I haven't tested the patch but I couldn't spot any problems while > reading it. A minor point, don't we need to use > ConditionVariableCancelSleep() at some point after doing > ConditionVariableTimedSleep? It's not really necessary - unless t

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-04-29 Thread Álvaro Herrera
On 2021-Apr-07, Andres Freund wrote: > I'm also confused by the use of ConditionVariableTimedSleep(timeout = > 10). Why do we need a timed sleep here in the first place? And why with > such a short sleep? I was scared of the possibility that a process would not set the CV for whatever reason, cau

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-04-08 Thread Amit Kapila
On Thu, Apr 8, 2021 at 7:39 AM Andres Freund wrote: > > On 2021-04-07 17:10:37 -0700, Andres Freund wrote: > > I think this can be solved in two different ways: > > > > 1) Hold ReplicationSlotAllocationLock with LW_SHARED across most of > >InvalidateObsoleteReplicationSlots(). That way nobody

Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-04-07 Thread Andres Freund
Hi, On 2021-04-07 17:10:37 -0700, Andres Freund wrote: > I think this can be solved in two different ways: > > 1) Hold ReplicationSlotAllocationLock with LW_SHARED across most of >InvalidateObsoleteReplicationSlots(). That way nobody could re-create a new >slot in the to-be-obsoleted-slot'

Race condition in InvalidateObsoleteReplicationSlots()

2021-04-07 Thread Andres Freund
Hi, I was looking at InvalidateObsoleteReplicationSlots() while reviewing / polishing the logical decoding on standby patch. Which lead me to notice that I think there's a race in InvalidateObsoleteReplicationSlots() (because ResolveRecoveryConflictWithLogicalSlots has a closely related one). voi