Re: [HACKERS] Archiver not exiting upon crash

2012-05-25 Thread Tom Lane
Jeff Janes writes: > So my test harness is an inexplicably effective show-case for the > vulnerability, but it is not the reason the vulnerability should be > fixed. I spent a bit of time looking into this. In principle the postmaster could be fixed to repeat the SIGQUIT signal every second or s

Re: [HACKERS] Archiver not exiting upon crash

2012-05-24 Thread Tom Lane
Jeff Janes writes: > On Wed, May 23, 2012 at 2:21 PM, Tom Lane wrote: >> However, I remain unsatisfied with this idea as an explanation for the >> behavior you're seeing. In the first place, that race condition window >> ought not be wide enough to allow failure probabilities as high as 10%. >>

Re: [HACKERS] Archiver not exiting upon crash

2012-05-24 Thread Jeff Janes
On Wed, May 23, 2012 at 2:21 PM, Tom Lane wrote: > I wrote: >> Jeff Janes writes: >>> But what happens if the SIGQUIT is blocked before the system(3) is >>> invoked?  Does the ignore take precedence over the block, or does the >>> block take precedence over the ignore, and so the signal is still

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Tom Lane
I wrote: > Jeff Janes writes: >> But what happens if the SIGQUIT is blocked before the system(3) is >> invoked? Does the ignore take precedence over the block, or does the >> block take precedence over the ignore, and so the signal is still >> waiting once the block is reversed after the system(3

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Tom Lane
Jeff Janes writes: > On Wed, May 23, 2012 at 1:10 PM, Tom Lane wrote: >> On my machine, man system(3) saith: >> >> system() ignores the SIGINT and SIGQUIT signals, and blocks the >> SIGCHLD signal, while waiting for the command to terminate. If this >> might cause the application to

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Jeff Janes
On Wed, May 23, 2012 at 1:10 PM, Tom Lane wrote: > Jeff Janes writes: >> It looks to me like the SIGQUIT from the postmaster is simply getting >> lost.  And from what little I understand of signal handling, this is a >> known race with system(3).  The archive_command, child of archiver, >> exits

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Tom Lane
I wrote: > On my machine, man system(3) saith: > system() ignores the SIGINT and SIGQUIT signals, and blocks the > SIGCHLD signal, while waiting for the command to terminate. If this > might cause the application to miss a signal that would have killed > it, the application sh

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Tom Lane
Jeff Janes writes: > It looks to me like the SIGQUIT from the postmaster is simply getting > lost. And from what little I understand of signal handling, this is a > known race with system(3). The archive_command, child of archiver, > exits before it can receive the signal sent to the entire arch

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Jeff Janes
On Mon, May 21, 2012 at 9:22 AM, Fujii Masao wrote: > On Sat, May 19, 2012 at 1:23 AM, Jeff Janes wrote: >> I've been testing the crash recovery of REL9_2_BETA1, using the same >> method I posted in the "Scaling XLog insertion" thread.  I have the >> checkpointer occasionally throw a FATAL error,

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Fujii Masao
On Thu, May 24, 2012 at 1:26 AM, Tom Lane wrote: > Peter Eisentraut writes: >> On mån, 2012-05-21 at 13:14 -0400, Tom Lane wrote: >>> ... wait, scratch that.  AFAICS, that commit was totally useless, >>> because BlockSig should always already contain SIGQUIT. > >> No, because PostgresMain() delet

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Tom Lane
Peter Eisentraut writes: > On mån, 2012-05-21 at 13:14 -0400, Tom Lane wrote: >> ... wait, scratch that. AFAICS, that commit was totally useless, >> because BlockSig should always already contain SIGQUIT. > No, because PostgresMain() deletes it from BlockSig. Ah. So potentially we have an iss

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Peter Eisentraut
On mån, 2012-05-21 at 13:14 -0400, Tom Lane wrote: > > ... but having said that, I see Peter's commit > > d6de43099ac0bddb4b1da40088487616da892164 only touched postgres.c's > > quickdie(), and not all the *other* background processes with > identical > > coding. That seems a clear oversight, so I

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Peter Eisentraut
On mån, 2012-05-21 at 12:52 -0400, Tom Lane wrote: > I see Peter's commit d6de43099ac0bddb4b1da40088487616da892164 only > touched postgres.c's quickdie(), and not all the *other* background > processes with identical coding. That seems a clear oversight, so I > will go fix it. None[*] of the othe

Re: [HACKERS] Archiver not exiting upon crash

2012-05-21 Thread Tom Lane
I wrote: > ... but having said that, I see Peter's commit > d6de43099ac0bddb4b1da40088487616da892164 only touched postgres.c's > quickdie(), and not all the *other* background processes with identical > coding. That seems a clear oversight, so I will go fix it. Doesn't > explain why the archiver

Re: [HACKERS] Archiver not exiting upon crash

2012-05-21 Thread Tom Lane
I wrote: > Fujii Masao writes: >> You might have gotten the following problem which was discussed before. >> This problem was fixed in SIGQUIT signal handler of a backend, but ISTM >> not that of an archiver. >> http://archives.postgresql.org/pgsql-admin/2009-11/msg00088.php > pgarch.c's SIGQUIT

Re: [HACKERS] Archiver not exiting upon crash

2012-05-21 Thread Tom Lane
Fujii Masao writes: > You might have gotten the following problem which was discussed before. > This problem was fixed in SIGQUIT signal handler of a backend, but ISTM > not that of an archiver. > http://archives.postgresql.org/pgsql-admin/2009-11/msg00088.php pgarch.c's SIGQUIT handler just does

Re: [HACKERS] Archiver not exiting upon crash

2012-05-21 Thread Tom Lane
Jeff Janes writes: > ... sometimes the automatic recovery never initiates. It looks > like the postmaster is waiting for the archiver to exit before it > starts recovery, and the archiver is waiting for something, I don't > really know what. Can you try poking into the archiver's state with gdb?

Re: [HACKERS] Archiver not exiting upon crash

2012-05-21 Thread Fujii Masao
On Sat, May 19, 2012 at 1:23 AM, Jeff Janes wrote: > I've been testing the crash recovery of REL9_2_BETA1, using the same > method I posted in the "Scaling XLog insertion" thread.  I have the > checkpointer occasionally throw a FATAL error, We should also fix this problem? If yes, could you show

[HACKERS] Archiver not exiting upon crash

2012-05-18 Thread Jeff Janes
I've been testing the crash recovery of REL9_2_BETA1, using the same method I posted in the "Scaling XLog insertion" thread. I have the checkpointer occasionally throw a FATAL error, which causes the postmaster to take down all of the other processes (DETAIL: The postmaster has commanded this ser