Hi,
Thanks for the detailed explanations, I've definitely misinterpreted
how interrupts and errors were handled.
On Fri, Jan 17, 2025 at 7:03 PM Andres Freund wrote:
>
> Might be worth using it it in src/test/postmaster/t/002_start_stop.pl? That
> has e.g. code to send a startup message.
I've c
Hi,
On 2025-01-17 13:03:35 -0500, Andres Freund wrote:
> > Previously, all interrupts except process dying were ignored while a
> > process was blocked writing to a socket. If the connection to the client
> > was broken (no clean FIN nor RST), a process sending results to the
> > client could be s
Hi,
On 2025-01-17 13:03:35 -0500, Andres Freund wrote:
> I don't see anything implementing the promotion of ERRORs to FATAL? You're
> preventing the error message being sent to the client, but I don't think that
> causes the connection to be terminated. The pre-existing code doesn't have
> that
Hi,
On 2025-01-17 17:01:53 +0100, Anthonin Bonnefoy wrote:
> I've cleaned up the tests: I've created a dedicated PgProto
> (definitely open to suggestions for a better name...) module
> containing all the helpers to send and receive messages on a raw
> socket in 0001.
Might be worth using it it i
I've tested on a PG16, the issue is indeed triggered with the
replication blocked while the conflicting query is stuck in
ClientWrite.
I've cleaned up the tests: I've created a dedicated PgProto
(definitely open to suggestions for a better name...) module
containing all the helpers to send and rec
Bonjour Thomas,
On Wed, Jan 15, 2025 at 6:21 AM Thomas Munro wrote:
> Right. Before commit 0da096d7 in v17, the recovery conflict code
> running in a signal handler would have set ProcDiePending, so this
> looks like an unintended regression due to that commit.
The issue is happening on instanc
Bonjour Anthonin,
On Mon, Jan 13, 2025 at 11:31 PM Anthonin Bonnefoy
wrote:
> To avoid blocking recovery for an extended period of time, this patch
> changes client write interrupts by handling recovery conflict
> interrupts instead of ignoring them. Since the interrupt happens while
> we're like
Hi,
I have replicas that have regular transient bursts of replay lag (>5
minutes). The events have the following symptoms:
- Replicas are using physical replication slot and hot_standby_feedback
- The lag recovers by itself after at most 15 minutes
- During the same timeframe, there's a query stuc