On Thursday 07 January 2010 14:45:55 Joachim Wieland wrote: > On Thu, Dec 31, 2009 at 6:40 PM, Simon Riggs <si...@2ndquadrant.com> wrote: > >> Building racy infrastructure when it can be avoided with a little care > >> still seems not to be the best path to me. > > > > Doing that will add more complexity in an area that is hard to test > > effectively. I think the risk of introducing further bugs while trying > > to fix this rare condition is high. Right now the conflict processing > > needs more work and is often much less precise than this, so improving > > this aspect of it would not be a priority. I've added it to the TODO > > though. Thank you for your research. > > > > Patch implements recovery conflict signalling using SIGUSR1 > > multiplexing, then uses a SessionCancelPending mode similar to Joachim's > > TransactionCancelPending. > > I have reworked Simon's patch a bit and attach the result. > > Quick facts: > > - Hot Standby only uses SIGUSR1 > - SIGINT behaves as it did before: it only cancels running statements > - pg_cancel_backend() continues to use SIGINT > - I added pg_cancel_idle_transaction() to cancel an idle transaction via > SIGUSR1 > - One central function HandleCancelAction() sets the flags before calling > ProcessInterrupts(), it is called from the different signal handlers and > receives parameters about what it should do > - If a SIGUSR1 reason is used that will cancel something, ProcArrayLock is > acquired until the signal has been sent to make sure that we won't signal > the wrong backend. Does this sufficiently cover the concerns of Andres > Freund discussed upthread? I think it solves the major concern (which I btw could easily reproduce using software that is in production) but unfortunately not completely. The avoided situation is:
C(Client): BEGIN; SELECT WHATEVER; S(Standby): conflict with C S: Starting to cancel C C: COMMIT S: Sending Signal to C C: Wrong transaction is aborted The situation not avoided is: C: BEGIN; SELECT ... S: conflict with C, lock procarray, sending signal(thats asynchronous), unlock procarray C: COMMIT; BEGIN C: Signal arrives C: Wrong txn is killled It should be easy to fix this by having a cancel_localTransactionId field in the procarray which gets cleaned uppon transaction/backend start and gets checked in the signal handler (should be casted to sig_atomic_t) Will cookup a patch if nobody speaks against something like that. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers