On Tue, Mar 28, 2017 at 9:23 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> On Tue, Mar 28, 2017 at 2:36 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> Hm ... I don't see a crash here, but I wonder whether you have parameters >>> set that would cause this query to be run as a parallel query? Because >>> pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems >>> probably insane. > >> /me blinks > >> Uh, what's insane about that? All it does is test a GUC (which is >> surely parallel-safe) and call SendPostmasterSignal (which seems safe, >> too). > > Well, if you don't like that theory, what's yours?
Gremlins? The stack trace seems to show that the process is receiving SIGUSR1 at a very high rate. Every time sigusr1_handler() reaches PG_SETMASK(&UnBlockSig), it immediately gets a SIGUSR1 and jumps back into sigusr1_handler(). Now, this seems like a design flaw in sigusr1_handler(). Likely the operating system blocks SIGUSR1 on entry to the signal handler so that it's not possible for a high rate of signal delivery to blow out the stack, but we forcibly unblock it before returning, thus exposing ourselves to blowing out the stack. And we have, apparently, no stack depth check here nor any other way of preventing the infinite recursion. I imagine here the behavior is platform-dependent, but I'd guess that select pg_current_logfile() from generate_series(1,1000000) g might reproduce this on affected platforms with or without parallel query in the mix. It looks like we've conveniently provided both a function that can be used to SIGUSR1 the heck out of the postmaster and a postmaster that is, at least on such platforms, vulnerable to crashing if you do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers