Boszormenyi Zoltan <z...@cybertec.at> writes: > [ 2-lock_timeout-v33.patch ]
I looked at this patch a bit. I don't understand why you've chosen to alter the API of the enable_timeout variants to have a bool result that says "I didn't bother to process the timeout because it would have fired immediately". That is not an advantage for any call site that I can see: it just means that the caller needs an extra, very difficult to test code path to handle what would normally be handled by a timeout interrupt. Even if it were a good API choice, you've broken a number of existing call sites that the patch fails to touch (eg in postmaster.c and postgres.c). It's not acceptable to define a failure return from enable_timeout_after and then let callers assume that the failure can't happen. Please undo that. Also, I'm not really enamored of the choice to use List* infrastructure for enable_timeouts(). That doesn't appear to be especially convenient for any caller, and what it does do is create memory-leak risks for most of them (if they forget to free the list cells, perhaps as a result of an error exit). I think a local-variable array of TimeoutParams[] structs would serve better for most use-cases. Another thought is that the PGSemaphoreTimedLock implementations all share the same bug, which is that if the "condition" callback returns true immediately after we acquire the semaphore, they will all wrongly return false indicating that the semaphore wasn't acquired. (BTW, I do not like either the variable name "condition" or the typedef name "TimeoutCondition" for something that's actually a callback function pointer.) In the same vein, this comment change: * NOTE: Think not to put any shared-state cleanup after the call to * ProcSleep, in either the normal or failure path. The lock state must ! * be fully set by the lock grantor, or by CheckDeadLock if we give up ! * waiting for the lock. This is necessary because of the possibility ! * that a cancel/die interrupt will interrupt ProcSleep after someone else ! * grants us the lock, but before we've noticed it. Hence, after granting, ! * the locktable state must fully reflect the fact that we own the lock; ! * we can't do additional work on return. * * We can and do use a PG_TRY block to try to clean up after failure, but * this still has a major limitation: elog(FATAL) can occur while waiting --- 1594,1606 ---- /* * NOTE: Think not to put any shared-state cleanup after the call to * ProcSleep, in either the normal or failure path. The lock state must ! * be fully set by the lock grantor, or by CheckDeadLock or by ProcSleep ! * itself in case a timeout is detected or if we give up waiting for the lock. ! * This is necessary because of the possibility that a cancel/die interrupt ! * will interrupt ProcSleep after someone else grants us the lock, but ! * before we've noticed it. Hence, after granting, the locktable state must ! * fully reflect the fact that we own the lock; we can't do additional work ! * on return. * suggests that you've failed to grasp the fundamental race-condition problem here. ProcSleep can't do cleanup after the fact any more than its callers can, because otherwise it has a race condition with some other process deciding to grant it the lock at about the same time its timeout goes off. I think the changes in ProcSleep that alter the state-cleanup behavior are just plain wrong, and what you need to do is make that look more like the existing mechanisms that clean up when statement timeout occurs. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers