Re: Accept recovery conflict interrupt on blocked writing

2025-01-21 Thread Anthonin Bonnefoy
Hi, Thanks for the detailed explanations, I've definitely misinterpreted how interrupts and errors were handled. On Fri, Jan 17, 2025 at 7:03 PM Andres Freund wrote: > > Might be worth using it it in src/test/postmaster/t/002_start_stop.pl? That > has e.g. code to send a startup message. I've c

Re: Accept recovery conflict interrupt on blocked writing

2025-01-17 Thread Andres Freund
Hi, On 2025-01-17 13:03:35 -0500, Andres Freund wrote: > > Previously, all interrupts except process dying were ignored while a > > process was blocked writing to a socket. If the connection to the client > > was broken (no clean FIN nor RST), a process sending results to the > > client could be s

Re: Accept recovery conflict interrupt on blocked writing

2025-01-17 Thread Andres Freund
Hi, On 2025-01-17 13:03:35 -0500, Andres Freund wrote: > I don't see anything implementing the promotion of ERRORs to FATAL? You're > preventing the error message being sent to the client, but I don't think that > causes the connection to be terminated. The pre-existing code doesn't have > that

Re: Accept recovery conflict interrupt on blocked writing

2025-01-17 Thread Andres Freund
Hi, On 2025-01-17 17:01:53 +0100, Anthonin Bonnefoy wrote: > I've cleaned up the tests: I've created a dedicated PgProto > (definitely open to suggestions for a better name...) module > containing all the helpers to send and receive messages on a raw > socket in 0001. Might be worth using it it i

Re: Accept recovery conflict interrupt on blocked writing

2025-01-17 Thread Anthonin Bonnefoy
I've tested on a PG16, the issue is indeed triggered with the replication blocked while the conflicting query is stuck in ClientWrite. I've cleaned up the tests: I've created a dedicated PgProto (definitely open to suggestions for a better name...) module containing all the helpers to send and rec

Re: Accept recovery conflict interrupt on blocked writing

2025-01-16 Thread Anthonin Bonnefoy
Bonjour Thomas, On Wed, Jan 15, 2025 at 6:21 AM Thomas Munro wrote: > Right. Before commit 0da096d7 in v17, the recovery conflict code > running in a signal handler would have set ProcDiePending, so this > looks like an unintended regression due to that commit. The issue is happening on instanc

Re: Accept recovery conflict interrupt on blocked writing

2025-01-14 Thread Thomas Munro
Bonjour Anthonin, On Mon, Jan 13, 2025 at 11:31 PM Anthonin Bonnefoy wrote: > To avoid blocking recovery for an extended period of time, this patch > changes client write interrupts by handling recovery conflict > interrupts instead of ignoring them. Since the interrupt happens while > we're like

Accept recovery conflict interrupt on blocked writing

2025-01-13 Thread Anthonin Bonnefoy
Hi, I have replicas that have regular transient bursts of replay lag (>5 minutes). The events have the following symptoms: - Replicas are using physical replication slot and hot_standby_feedback - The lag recovers by itself after at most 15 minutes - During the same timeframe, there's a query stuc