On Wed, May 19, 2021 at 12:04 PM Michael Paquier <mich...@paquier.xyz> wrote: > On Tue, May 18, 2021 at 07:42:08PM -0400, Tom Lane wrote: > > I count three distinct bugs that were exposed by this attempt: > > > > 1. In the part of 013_partition.pl that tests firing AFTER > > triggers on partitioned tables, we have a case of continuing > > to access a relcache entry that's already been closed. > > (I'm not quite sure why prion's -DRELCACHE_FORCE_RELEASE > > hasn't exposed this.) It looks to me like instead we had > > a relcache reference leak before f3b141c48, but now, the > > only relcache reference count on a partition child table > > is dropped by ExecCleanupTupleRouting, which logical/worker.c > > invokes before it fires triggers on that table. Kaboom.
Oops. > > This might go away if worker.c weren't so creatively different > > from the other code paths concerned with executor shutdown. > > The tuple routing has made the whole worker logic messier by a larger > degree compared to when this stuff was only able to apply DMLs changes > on the partition leaves. I know that it is not that great to be more > creative here, but we need to make sure that AfterTriggerEndQuery() is > moved before ExecCleanupTupleRouting(). We could either keep the > ExecCleanupTupleRouting() calls as they are now, and call > AfterTriggerEndQuery() in more code paths. Yeah, that's what I thought to propose doing too. Your patch looks enough in that regard. Thanks for writing it. > Or we could have one > PartitionTupleRouting and one ModifyTableState created beforehand > in create_estate_for_relation() if applying the change on a > partitioned table but that means manipulating more structures across > more layers of this code. Yeah, that seems like too much change to me too. > Something like the attached fixes the > problem for me, but honestly it does not help in clarifying this code > more. I was not patient enough to wait for CLOBBER_CACHE_ALWAYS to > initialize the nodes in the TAP tests, so I have tested that with a > setup initialized with a non-clobber build, and reproduced the problem > with CLOBBER_CACHE_ALWAYS builds on those same nodes. I have checked the fix works with a CLOBBER_CACHE_ALWAYS build. > You are right that this is not a problem of 14~. I can reproduce the > problem on 13 as well, and we have no coverage for tuple routing with > triggers on this branch, so this would never have been stressed in the > buildfarm. There is a good argument to be made here in cherry-picking > 2ecfeda3 to REL_13_STABLE. +1 -- Amit Langote EDB: http://www.enterprisedb.com