Hi,

On 2022-02-18 18:49:14 -0500, Tom Lane wrote:
> Andres Freund <and...@anarazel.de> writes:
> > We'd also need to add pq_endmessage_noblock(), because the pq_endmessage()
> > obviously tries to send (as in the backtrace upthread) if the output buffer 
> > is
> > large enough, which it often will be in walsender.
> 
> I don't see that as "obvious".  If we're there, we do not have an
> error situation.

The problem is that due walsender using pq_putmessage_noblock(), the output
buffer is often going to be too full for a plain pq_endmessage() to not send
out data.  Because walsender will have an output buffer > PQ_SEND_BUFFER_SIZE
a lot of the time, errors will commonly be thrown with the output buffer
full.

That leads the pq_endmessage() in send_message_to_frontend() to block sending
the error message and the preceding WAL data. Leading to backtraces like:

> #0  0x00007faf4a570ec6 in epoll_wait (epfd=4, events=0x7faf4c201458, 
> maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
> #1  0x00007faf4b8ced5c in WaitEventSetWaitBlock (set=0x7faf4c2013e0, 
> cur_timeout=-1, occurred_events=0x7ffe47df2320, nevents=1)
>     at /home/andres/src/postgresql/src/backend/storage/ipc/latch.c:1465
> #2  0x00007faf4b8cebe3 in WaitEventSetWait (set=0x7faf4c2013e0, timeout=-1, 
> occurred_events=0x7ffe47df2320, nevents=1, wait_event_info=100663297)
>     at /home/andres/src/postgresql/src/backend/storage/ipc/latch.c:1411
> #3  0x00007faf4b70f48b in secure_write (port=0x7faf4c22da50, 
> ptr=0x7faf4c2f1210, len=21470) at 
> /home/andres/src/postgresql/src/backend/libpq/be-secure.c:298
> #4  0x00007faf4b71aecb in internal_flush () at 
> /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1380
> #5  0x00007faf4b71ada1 in internal_putbytes (s=0x7ffe47df23dc "E\177", len=1) 
> at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1326
> #6  0x00007faf4b71b0cf in socket_putmessage (msgtype=69 'E', s=0x7faf4c201700 
> "SFATAL", len=112)
>     at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1507
> #7  0x00007faf4b71c151 in pq_endmessage (buf=0x7ffe47df2460) at 
> /home/andres/src/postgresql/src/backend/libpq/pqformat.c:301
> #8  0x00007faf4babbb5e in send_message_to_frontend (edata=0x7faf4be2f1c0 
> <errordata>) at 
> /home/andres/src/postgresql/src/backend/utils/error/elog.c:3253
> #9  0x00007faf4bab8aa0 in EmitErrorReport () at 
> /home/andres/src/postgresql/src/backend/utils/error/elog.c:1541
> #10 0x00007faf4bab5ec0 in errfinish (filename=0x7faf4bc9770d "postgres.c", 
> lineno=3192, funcname=0x7faf4bc99170 <__func__.8> "ProcessInterrupts")
>     at /home/andres/src/postgresql/src/backend/utils/error/elog.c:592
> #11 0x00007faf4b907e73 in ProcessInterrupts () at 
> /home/andres/src/postgresql/src/backend/tcop/postgres.c:3192
> #12 0x00007faf4b8920af in WalSndLoop (send_data=0x7faf4b8927f2 
> <XLogSendPhysical>) at 
> /home/andres/src/postgresql/src/backend/replication/walsender.c:2404





> > I guess we could try to flush in a blocking manner sometime later in the
> > shutdown sequence, after we've released resources? But I'm doubtful it's a
> > good idea, we don't really want to block waiting to exit when e.g. the 
> > network
> > connection is dead without the TCP stack knowing.
> 
> I think you are trying to move in the direction of possibly exiting
> without ever sending at all, which does NOT seem like an improvement.

I was just talking about doing another blocking pq_flush(), in addition to the
pq_flush_if_writable() earlier. That'd be trying harder to send out data, not
less hard...

Greetings,

Andres Freund


Reply via email to