On Thu, Jan 31, 2019 at 8:29 PM David Rowley <david.row...@2ndquadrant.com> wrote: > I think perhaps accepting invalidations at the start of the statement > might appear to fix the problem in master, but I think there's still a > race condition around CheckCachedPlan() since we'll ignore > invalidation messages when we perform LockRelationOid() in > AcquireExecutorLocks() due to the lock already being held. So there's > likely still a window where the invalidation message could come in and > we miss it. That said, if lock is already taken in one session, and > some other session alters some relation we have a lock on, then that > alternation can't have been done with an AEL, as that would have > blocked on our lock, so it must be some ShareUpdateExclusiveLock > change, and if so that change must not be important enough to block > concurrently executing queries, so likely shouldn't actually break > anything. > > So in summary, I think that checking for invalidation messages at the > start of the statement allows us to replan within a transaction, but > does not guarantee we can the exact correct plan for the current > settings.
Right, but that's also an inevitably consequence of postponing locking. If you start running the statement without locking every object involved in the query, then somebody could be holding an AccessExclusiveLock on that object and changing it in any way that they're allowed to do without holding a lock that does conflict with one you're holding. Nothing we do with invalidation messages can prevent that from happening, though it can make the race narrower. We *must* be able to cope with having the out-of-date plan or this whole idea is sunk. > I also don't have a full grasp on why we must ignore invalidation > messages when we already have a lock on the relation. I assume it's > not a huge issue since obtaining a new lock on any other relation will > process the invalidation queue anyway. I know that f868a8143a9 > recently made some changes to fix some recursive issues around these > invalidations. It's not necessary. It's a performance optimization. The idea is: we need our caches to be up to date. Anything that changes an object should take AccessExclusiveLock, send invalidation messages, and only then release the AccessExclusiveLock. Anything that looks at an object should acquire at least AccessShareLock and then process invalidation messages. This protocol guarantees that any changes that are made under AEL will be seen in a race-free way, since the invalidation messages are sent before releasing the lock and processed after taking one. But it doesn't require processing those messages when we haven't taken a lock, so we don't. > (I admit to not having the best grasp on how all this works, so feel > free to shoot this down) I think the key question here is whether or not you can cope with someone having done arbitrary AEL-requiring modifications to the delaylocked partitions. If you can, the fact that the plan might sometimes be out-of-date is an inevitable consequence of doing this at all, but I think we can argue that it's an acceptable consequence as long as the resulting behavior is not too bletcherous. If you can't, then this patch is dead. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company