Hi Bertrand, On Wed, Feb 24, 2021 at 5:56 AM Drouvot, Bertrand <bdrou...@amazon.com> wrote: > Hi hackers, > > Here is a scenario that segfault during delete (with version >= 12): > > create table parent ( > col1 text primary key > ); > > create table child ( > col1 text primary key, > FOREIGN KEY (col1) REFERENCES parent(col1) on delete cascade > ); > > CREATE or replace FUNCTION trigger_function() > RETURNS TRIGGER > LANGUAGE PLPGSQL > AS $$ > BEGIN > raise notice 'trigger = %, old table = %', > TG_NAME, > (select string_agg(old_table::text, ', ' order by col1) from > old_table); > return NULL; > END; > $$ > ; > > create trigger bdt_trigger after delete on child REFERENCING OLD TABLE AS > old_table for each statement EXECUTE function trigger_function(); > > alter table child add column col2 text not null default 'tutu'; > insert into parent(col1) values ('1'); > insert into child(col1) values ('1'); > insert into parent(col1) values ('2'); > insert into child(col1) values ('2'); > delete from parent; > > produces: > > CREATE TABLE > CREATE TABLE > CREATE FUNCTION > CREATE TRIGGER > ALTER TABLE > INSERT 0 1 > INSERT 0 1 > INSERT 0 1 > INSERT 0 1 > psql:./segfault.repro.sql:35: server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > psql:./segfault.repro.sql:35: fatal: connection to server was lost > > The column being added to the child relation needs to have a default value > for the segfault to be triggered. > > The stack is the following: > > #0 0x000000000049aa59 in execute_attr_map_slot (attrMap=0x11736c8, > in_slot=0x1179c90, out_slot=0x1179da8) at tupconvert.c:193 > #1 0x0000000000700ec0 in AfterTriggerSaveEvent (estate=0x1171820, > relinfo=0x1171cc8, event=1, row_trigger=true, oldslot=0x1179c90, newslot=0x0, > recheckIndexes=0x0, modifiedCols=0x0, transition_capture=0x1172438) at > trigger.c:5488 > #2 0x00000000006fc6a8 in ExecARDeleteTriggers (estate=0x1171820, > relinfo=0x1171cc8, tupleid=0x7ffd2b7f2e40, fdw_trigtuple=0x0, > transition_capture=0x1172438) at trigger.c:2565 > #3 0x0000000000770794 in ExecDelete (mtstate=0x1171a90, > resultRelInfo=0x1171cc8, tupleid=0x7ffd2b7f2e40, oldtuple=0x0, > planSlot=0x1173630, epqstate=0x1171b88, estate=0x1171820, > processReturning=true, canSetTag=true, changingPart=false, tupleDeleted=0x0, > epqreturnslot=0x0) > at nodeModifyTable.c:1128 > #4 0x00000000007724cc in ExecModifyTable (pstate=0x1171a90) at > nodeModifyTable.c:2259 > > I had a look at it, and it looks to me that the slot being reused in > AfterTriggerSaveEvent() (when (map != NULL) and not (!storeslot)) has > previously been damaged by FreeExecutorState() in standard_ExecutorEnd().
Right. > Then I ended up with the enclosed patch proposal that does not make the repro > segfaulting and that is passing make check too. > > Not sure that what is being done in the patch is the right/best approach to > solve the issue though. Hmm, I don't think we should be *freshly* allocating the TupleTableSlot every time. Not having to do that is why the commit ff11e7f4b9ae0 seems to have invented AfterTriggersTableData.storeslot in the first place, that is, to cache the slot once created. Problem with the current way as you've discovered is that the slot gets allocated in the execution-span memory context, whereas the AfterTriggersTableData instance, of which the slot is a part, is supposed to last the entire (sub-)transaction. So the correct fix I think is to allocate the slot to be stored in AfterTriggersTableData.storeslot in the transaction-span memory context as well. Having taken care of that, another problem with the current way is that it adds the slot to es_tupleTable by calling ExecAllocTupleSlot(), which means that the slot will be released when ExecResetTupleTable() is called on es_tupleTable as part of ExecEndPlan(). That would defeat the point of allocating the slot in the transaction-span context. So let's use MakeSingleTableSlot() to allocate a standalone slot to be stored in AfterTriggersTableData.storeslot and have AfterTriggerFreeQuery() call ExecDropSingleTupleTableSlot() to release it. > PS: the same repro with a foreign key with update cascade, an update trigger > and an update on the col1 column would segfault too (but does not with the > enclosed patch proposal). Actually, we also need to fix similar code in the block for populating the transition NEW TABLE, because without doing so that block too can crash similarly: if (!TupIsNull(newslot) && ((event == TRIGGER_EVENT_INSERT && insert_new_table) || (event == TRIGGER_EVENT_UPDATE && update_new_table))) I've attached a patch with my suggested fixes and also test cases. Please take a look. -- Amit Langote EDB: http://www.enterprisedb.com
v2-0001-Fix-use-after-tree-bug-with-AfterTriggersTableDat.patch
Description: Binary data