Hi, On 2025-01-17 17:01:53 +0100, Anthonin Bonnefoy wrote: > I've cleaned up the tests: I've created a dedicated PgProto > (definitely open to suggestions for a better name...) module > containing all the helpers to send and receive messages on a raw > socket in 0001.
Might be worth using it it in src/test/postmaster/t/002_start_stop.pl? That has e.g. code to send a startup message. > Also, I think the DoingBlockingWrite variable was unnecessary? Calling > ProcessRecoveryConflictInterrupts should be enough as I don't think > QueryCancelHoldoffCount can be >0 when writing on a socket it should always > be able to cancel the statement. I'd probably add an assertion to that effect, it's too easy for that to change in a few years otherwise. > Previously, all interrupts except process dying were ignored while a > process was blocked writing to a socket. If the connection to the client > was broken (no clean FIN nor RST), a process sending results to the > client could be stuck for 924s until the TCP retransmission timeout is > reached. During this time, it was possible for the process to conflict > with recovery: For example, the process returning results can have a > conflicting buffer pin. > > To avoid blocking recovery for an extended period of time, this patch > changes client write interrupts by handling recovery conflict interrupts > instead of ignoring them. Since the interrupt happens while we're likely > to have partially written results on the socket, there's no easy way to > keep protocol sync so the session needs to be terminated. I don't see anything implementing the promotion of ERRORs to FATAL? You're preventing the error message being sent to the client, but I don't think that causes the connection to be terminated. The pre-existing code doesn't have that problem, because it's only active when ProcDiePending is already set. In fact, the test your patch added goes through ProcessRecoveryConflictInterrupts() multiple times: 2025-01-17 12:52:47.842 EST [3376411] LOG: recovery still waiting after 20.709 ms: recovery conflict on buffer pin 2025-01-17 12:52:47.842 EST [3376411] CONTEXT: WAL redo at 0/3462288 for Heap2/PRUNE_VACUUM_SCAN: , isCatalogRel: F, nplans: 0, nredirected: 0, ndead: 0, nun> 3376451: recovery conflict interrupt while blocked 3376451: recovery conflict processing done write(8192) = -1: 11/Resource temporarily unavailable 3376451: recovery conflict interrupt while blocked 3376451: recovery conflict processing done ... write(8192) = -1: 11/Resource temporarily unavailable 3376451: recovery conflict interrupt while blocked 3376451: recovery conflict processing done write(8192) = -1: 11/Resource temporarily unavailable 3376451: recovery conflict interrupt while blocked 2025-01-17 12:52:48.072 EST [3376451] 031_recovery_conflict.pl ERROR: canceling statement due to conflict with recovery 2025-01-17 12:52:48.072 EST [3376451] 031_recovery_conflict.pl DETAIL: User was holding shared buffer pin for too long. 2025-01-17 12:52:48.072 EST [3376451] 031_recovery_conflict.pl STATEMENT: BEGIN; DECLARE test_recovery_conflict_cursor CURSOR FOR SELECT b FROM test_recovery_conflict_table1; FETCH FORWARD FROM test_recovery_conflict_cursor; SELECT generate_series(1, 100000); backend 3376451> 2025-01-17 12:52:48.072 EST [3376411] LOG: recovery finished waiting after 250.681 ms: recovery conflict on buffer pin I don't actually know why the conflict ends up being resolved after a bunch of retries. Note also the "backend>" (to which I added the PID to identify it) which gets emitted. Just setting whereToSendOutput = DestNone has side effects when not actually in a process exit status... > @@ -581,6 +582,22 @@ ProcessClientWriteInterrupt(bool blocked) > else > SetLatch(MyLatch); > } > + else if (blocked && RecoveryConflictPending) > + { > + /* > + * We're conflicting with recovery while being blocked writing. > This > + * can happen when the process is returning results and no ACK > is > + * received (broken connection, client overloaded...), > eventually > + * saturating the socket buffer while the process holds a page > pin > + * that eventually conflict with recovery. > + */ > + if (InterruptHoldoffCount == 0 && CritSectionCount == 0) I was about to say that we shouldn't get here with CritSectionCount != 0, but unfortunately I'm not sure that's true. Thinking of e.g. debug messages or such. And it's of course a copy of existing code... Playing around with this it's unfortunately worse - we very commonly get to ProcessClientWriteInterrupt() with InterruptHoldoffCount > 0. Including just after processing recovery conflict interrupts. The reason for that is that if we do throw an error the first thing we do is to hold interrupts (c.f. sigsetjmp() in PostgresMain()) and then we call EmitErrorReport(). Unfortunately I suspect that this means any protection we'd get from this version of the patch is rather incomplete - if the recovery conflict is triggered while not blocked, we'll ERROR out and report the message with interrupts held. That cycle of recovery conflict signalling wouldn't ever be resolved in such a case, I think. Of course we do re-signal recovery conflict interrupts rather aggressively - but I'm not sure that's going to help reliably. I've previously suggested that we ought to report some errors to the client by trying to send the data in a a non-blocking way, and just terminate if we can't send the data immediately. IIRC Tom was concerned that that would lead to too many lost messages. Perhaps we could instead implement this with a fairly timer that terminates connections when stuck trying to write out an error message? > + { > + if (whereToSendOutput == DestRemote) > + whereToSendOutput = DestNone; > + ProcessRecoveryConflictInterrupts(); > + } > + } > This largely duplicates the existing code in ProcessClientWriteInterrupt(). Perhaps it's worth flipping the order of branches so that we first check that we're neither blocked nor have InterruptHoldoffCount == 0 && CritSectionCount == 0? Greetings, Andres Freund