Re: Async-unsafe functions in signal handlers

Andres Freund Fri, 27 Aug 2021 14:06:05 -0700

Hi,

On 2021-08-27 23:51:27 +0300, Denis Smirnov wrote:
> > 26 авг. 2021 г., в 23:39, Tom Lane <t...@sss.pgh.pa.us> написал(а):
> > 
> > (BTW, I think it's pretty silly to imagine that adding backtrace()
> > calls inside ereport is making things any more dangerous.  ereport
> > has pretty much always carried a likelihood of calling malloc(),
> > for example.)
> 
> I have taken a look through the signal handlers and found out that many of 
> them use malloc() via ereport() and elog(). Here is the list:
> 
> SIGUSR1
> - procsignal_sigusr1_handler(): autoprewarm, autovacuum, bgworker, bgwriter, 
> checkpointer, pgarch, startup, walwriter, walreciever, walsender


There shouldn't be meaningful uses of elog/ereport() inside
procsignal_sigusr1_handler(). The exception I found was an elog(FATAL) for
unreachable code.


> - sigusr1_handler(): postmaster
> SIGHUP:
> - SIGHUP_handler(): postmaster
> SIGCHLD:
> - reaper(): postmaster

I think these runs in a very controlled set of circumstances because most of
postmaster runs with signals masked.


> SIGFPE:
> - FloatExceptionHandler(): autovacuum, bgworker, postgres, plperl

Yep, although as discussed this might not be a "real" problem because it
should only run during an instruction triggering an FPE.


> SIGQUIT:
> - quickdie(): postgres

Yes, this is an issue. I've previously argued for handling this via write()
and _exit(), instead of the full ereport() machinery. However, we have a
bandaid that deals with possible hangs, by SIGKILLing when processes don't
shut down (at that point things have already gone quite south, so that's not
an issue).


> SIGTERM:
> - bgworker_die(): bgworker

Bad.


> SIGALRM:
> - handle_sig_alarm(): autovacuum, bgworker, postmaster, startup, walsender, 
> postgres

I don't think there reachable elogs in there. I'm not concerned about e.g.
                elog(FATAL, "timeout index %d out of range 0..%d", index,
                         num_active_timeouts - 1);
because that's not something that should ever be reachable in a production
scenario. If it is, there's bigger problems.


Perhaps we ought to have a version of Assert() that's enabled in production
builds as well, and that outputs the error messages via write() and then
_exit()s?

Greetings,

Andres Freund

Re: Async-unsafe functions in signal handlers

Reply via email to