Re: Delay locking partitions during query execution

Robert Haas Fri, 01 Feb 2019 05:52:56 -0800

On Thu, Jan 31, 2019 at 8:29 PM David Rowley
<[email protected]> wrote:
> I think perhaps accepting invalidations at the start of the statement
> might appear to fix the problem in master, but I think there's still a
> race condition around CheckCachedPlan() since we'll ignore
> invalidation messages when we perform LockRelationOid() in
> AcquireExecutorLocks() due to the lock already being held.  So there's
> likely still a window where the invalidation message could come in and
> we miss it.    That said, if lock is already taken in one session, and
> some other session alters some relation we have a lock on, then that
> alternation can't have been done with an AEL, as that would have
> blocked on our lock, so it must be some ShareUpdateExclusiveLock
> change, and if so that change must not be important enough to block
> concurrently executing queries, so likely shouldn't actually break
> anything.
>
> So in summary, I think that checking for invalidation messages at the
> start of the statement allows us to replan within a transaction, but
> does not guarantee we can the exact correct plan for the current
> settings.


Right, but that's also an inevitably consequence of postponing
locking.  If you start running the statement without locking every
object involved in the query, then somebody could be holding an
AccessExclusiveLock on that object and changing it in any way that
they're allowed to do without holding a lock that does conflict with
one you're holding.  Nothing we do with invalidation messages can
prevent that from happening, though it can make the race narrower. We
*must* be able to cope with having the out-of-date plan or this whole
idea is sunk.

> I also don't have a full grasp on why we must ignore invalidation
> messages when we already have a lock on the relation.  I assume it's
> not a huge issue since obtaining a new lock on any other relation will
> process the invalidation queue anyway. I know that f868a8143a9
> recently made some changes to fix some recursive issues around these
> invalidations.

It's not necessary.  It's a performance optimization.  The idea is: we
need our caches to be up to date.  Anything that changes an object
should take AccessExclusiveLock, send invalidation messages, and only
then release the AccessExclusiveLock.  Anything that looks at an
object should acquire at least AccessShareLock and then process
invalidation messages.  This protocol guarantees that any changes that
are made under AEL will be seen in a race-free way, since the
invalidation messages are sent before releasing the lock and processed
after taking one.  But it doesn't require processing those messages
when we haven't taken a lock, so we don't.

> (I admit to not having the best grasp on how all this works, so feel
> free to shoot this down)

I think the key question here is whether or not you can cope with
someone having done arbitrary AEL-requiring modifications to the
delaylocked partitions.  If you can, the fact that the plan might
sometimes be out-of-date is an inevitable consequence of doing this at
all, but I think we can argue that it's an acceptable consequence as
long as the resulting behavior is not too bletcherous.  If you can't,
then this patch is dead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Delay locking partitions during query execution

Reply via email to