What's the timing of the errors? Is there a chance that we are sending
the kill signal before the signal handling thread has actually started
*and created the named pipe*? 

We set up the signal handling stuff pretty early, but we do seem to let
the postmaster continue it's work before it's up...

Under heavy load, a signal will typically be dropped within the first
few minutes.  However, it can sometimes take a little while before the
problem happens.  Thousands of the same signal to the same process may
be properly handled before one is mishandled.  This is not consistant
with a problem with initial creation of the pipe.

Going back to your tests, did it ever require more than one retry?

Yes, but rarely. In a 90 hour stress test with code that allowed up to 5
calls to CallNamedPipe, I found 760 signals that required a retry.  Only
one required two retries.  That is why I set the number of retries to 2.
The behavior might be different if the sleep interval between retries
was changed.  I used a 20 ms sleep interval between retries in all my
tests, and in the patch I sent.



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply via email to