On 7 November 2018 at 21:31, Kato, Sho <kato-...@jp.fujitsu.com> wrote: > AFAIK, When CREATE INDEX on a partition and INSERT to a parent table are > executed at the same time, this patch causes deadlock. > > * partitions information > > Partition key: RANGE (a) > Partitions: a_1 FOR VALUES FROM (1) TO (100), > a_2 FOR VALUES FROM (100) TO (200) > > T1: create index a_1_ix on a_1(a); > T2: insert into a values(101),(1); locking a_2 and waiting releasing a_1’s > lock > T1: create index a_2_ix on a_2(a); ERROR: deadlock detected > > I think this situation does not mean unsafe because similar situation will > occurs at no partitioned tables and DBMS could not prevent this situation. > > But, I'm not sure this behavior is correct.
That's one case that will deadlock with the 0002 patch in [1]. Another command that takes a conflicting lock on the partition only is TRUNCATE. Many other commands that normally take an AEL can't be run on a partition, for example, ALTER TABLE ADD COLUMN. The same would have deadlocked on master if you'd created the indexes in the reverse order, but I guess the point is that today there's some defined order that you can create the indexes in that won't cause a deadlock, but with [1] there is no defined order since the tables are locked in order that tuples are routed to them. Robert pointed out something interesting in [2] about UPDATEs of a partition key perhaps routing a tuple to a partition that's been excluded by constraint exclusion, so the lock is not taken initially, but would be taken later in ExecSetupPartitionTupleRouting(). I ran through that scenario today and discovered it can't happen as excluded partitions are still in the rangetable, so are still locked either in the planner or by AcquireExecutorLocks() and the order those are locked in is well defined from the planner. However, I believe that's something Amit plans to change in [3], where he proposes to only lock partitions that survive partition pruning (among many other things). There are probably a few things we could do here: a. Have all DDL on partitions obtain the same lock level on all ancestor partitioned tables taking a lock on the root first and working down to the leaf. b. Just lock what we touch and increase the likelihood of deadlocks occurring. c. Reject [1] and [3] because we don't want to increase the chances of deadlocks. I'm highly against doing 'c' since both [1] and [3] are very useful patches which scale partitioning to work well with very large numbers of partitions. At the moment we just can bearly do half a dozen partitions before the performance starts going south. 'a' is not that ideal a solution either as it means that anyone doing CREATE INDEX or TRUNCATE on a partition would conflict with queries that are querying an unrelated part of the partitioned table. While I don't really like 'b' either, I wonder how many people are going to hit deadlocks here. To hit that, I believe that you'd have to be doing the TRUNCATE or CREATE INDEX inside a transaction and to hit it where you couldn't hit it before you'd have had to carefully written your CREATE INDEX / TRUNCATE statements to ensure they're executed in partition order. I'm unsure how likely that is, but I certainly can't rule out that doing 'b' won't upset anyone. Perhaps a GUC is needed to choose between 'a' and 'b'? I'm adding Amit to this email because [3] is his and Robert because he's recently been talking about partition locking too. [1] https://www.postgresql.org/message-id/CAKJS1f-vYAHqqaU878Q4cdZYHwPcv1J_C-mG%3DCs2UwhsD6cqwg%40mail.gmail.com [2] https://www.postgresql.org/message-id/CA%2BTgmoZGJsy-nRFnzurhZQJtHdDh5fzJKvbvhS0byN6_46pB9Q%40mail.gmail.com [3] https://commitfest.postgresql.org/20/1778/ -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services