On Thu, Sep 3, 2020 at 6:50 PM Amit Langote <amitlangot...@gmail.com> wrote: > > Hi, > > Starting a new thread to discuss a bug related to $subject that Hao Wu > reported on thread titled "ALTER TABLE .. DETACH PARTITION > CONCURRENTLY" [1]. I have been able to reproduce the bug using steps > that Hao gave in that email: > > create table tpart (i int, j int) partition by range(i); > create table tpart_1 (like tpart); > create table tpart_2 (like tpart); > create table tpart_default (like tpart); > alter table tpart attach partition tpart_1 for values from (0) to (100); > alter table tpart attach partition tpart_default default; > insert into tpart_2 values (110,110), (120,120), (150,150); > > Session 1: > > begin; > alter table tpart attach partition tpart_2 for values from (100) to (200); > > Session 2: > > begin; > insert into tpart values (110,110), (120,120), (150,150); > <blocks waiting for the concurrent attach to finish> > > Session 1: > > end; > > Session 2: > > select tableoid::regclass, * from tpart; > end; > > The select will show that rows inserted by session 2 are inserted into > tpart_default, whereas after successfully attaching tpart_2, they do > not actually belong there. > > The problem is that when session 2 inserts those rows into tpart, it > only knows about 2 partitions: tpart_1, tpart_default, of which it > selects tpart_default to insert those rows into. When tpart_default > is locked to perform the insert, it waits for session 1 to release the > lock taken on tpart_default during the attach command. When it is > unblocked, it proceeds to finish the insert without rechecking the > partition constraint which would have been updated as result of a new > partition having been added to the parent. > > Note that we don't normally check the partition constraint when > inserting a row into a partition if the insert occurs via tuple > routing, which makes sense for non-default partitions whose partition > constraint cannot change due to concurrent activity. But this test > case has shown that the assumption is not safe for a default partition > whose constraint is a function of other partitions that exist as of > when the insert occurs. > > By the way, if you reverse the order of operations between session 1 > and 2 such that the insert by session 2 occurs first and then the > attach by session 1, then you will correctly get this error from the > attach command: > > ERROR: updated partition constraint for default partition > "tpart_default" would be violated by some row > > Attached is a patch to fix things on the insert side.
Forgot to mention that the problem exists as of v12 (commit: 898e5e329). -- Amit Langote EnterpriseDB: http://www.enterprisedb.com