On Thu, Dec 20, 2018 at 4:38 PM Robert Haas <robertmh...@gmail.com> wrote: > Lowering the lock level might also make something that was previously > safe into something unsafe, because now there's no longer a guarantee > that invalidation messages are received soon enough. With > AccessExclusiveLock, we'll send invalidation messages before releasing > the lock, and other processes will acquire the lock and then > AcceptInvalidationMessages(). But with ShareUpdateExclusiveLock the > locks can coexist, so now there might be trouble. I think this is an > area where we need to do some more investigation.
So there are definitely problems here. With my patch: create table tab (a int, b text) partition by range (a); create table tab1 partition of tab for values from (0) to (10); prepare t as select * from tab; begin; explain execute t; -- seq scan on tab1 execute t; -- no rows Then, in another session: alter table tab detach partition tab1; insert into tab1 values (300, 'oops'); Back to the first session: execute t; -- shows (300, 'oops') explain execute t; -- still planning to scan tab1 commit; explain execute t; -- now it got the memo, and plans to scan nothing execute t; -- no rows Well, that's not good. We're showing a value that was never within the partition bounds of any partition of tab. The problem is that, since we already have locks on all relevant objects, nothing triggers the second 'explain execute' to process invalidation messages, so we don't update the plan. Generally, any DDL with less than AccessExclusiveLock has this issue. On another thread, I was discussing with Tom and Peter the possibility of trying to rejigger things so that we always AcceptInvalidationMessages() at least once per command, but I think that just turns this into a race: if a concurrent commit happens after 'explain execute t' decides not to re-plan but before it begins executing, we have the same problem. This example doesn't involve partition pruning, and in general I don't think that the problem is confined to partition pruning. It's rather that if there's no conflict between the lock that is needed to change the set of partitions and the lock that is needed to run a query, then there's no way to guarantee that the query runs with the same set of partitions for which it was planned. Unless I'm mistaken, which I might be, this is also a problem with your approach -- if you repeat the same prepared query in the same transaction, the transaction snapshot will be updated, and thus the PartitionDesc will be expanded differently at execution time, but the plan will not have changed, because invalidation messages have not been processed. Anyway, I think the only fix here is likely to be making the executor resilient against concurrent changes in the PartitionDesc. I don't think there's going to be any easy way to compensate for added partitions without re-planning, but maybe we could find a way to flag detached partitions so that they return no rows without actually touching the underlying relation. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company