Hi, While discussing partition related code with David in [1], I again was confused by the layering of partition related code in nodeModifyTable.c.
1) How come partition routing is done outside of ExecInsert()? case CMD_INSERT: /* Prepare for tuple routing if needed. */ if (proute) slot = ExecPrepareTupleRouting(node, estate, proute, resultRelInfo, slot); slot = ExecInsert(node, slot, planSlot, estate, node->canSetTag); /* Revert ExecPrepareTupleRouting's state change. */ if (proute) estate->es_result_relation_info = resultRelInfo; break; That already seems like a layering violation, but it's made worse by ExecUpdate() having its partition handling solely inside - including another call to ExecInsert(), including the surrounding partition setup code. And even worse, after all that, ExecInsert() still contains partitioning code. It seems to me that if we just moved the ExecPrepareTupleRouting() into ExecInsert(), we could remove the duplication. 2) The contents of the /* * If a partition check failed, try to move the row into the right * partition. */ if (partition_constraint_failed) block ought to be moved to a separate function (maybe ExecCrossPartitionUpdate or ExecMove). ExecUpdate() is already complicated enough without dealing with the partition move. 3) How come we reset estate->es_result_relation_info after partition routing, but not the mtstate wide changes by ExecPrepareTupleRouting()? Note that its comment says: * Caller must revert the estate changes after executing the insertion! * In mtstate, transition capture changes may also need to be reverted. ExecUpdate() contains /* * Updates set the transition capture map only when a new subplan * is chosen. But for inserts, it is set for each row. So after * INSERT, we need to revert back to the map created for UPDATE; * otherwise the next UPDATE will incorrectly use the one created * for INSERT. So first save the one created for UPDATE. */ if (mtstate->mt_transition_capture) saved_tcs_map = mtstate->mt_transition_capture->tcs_map; but as I read the code, that's not really true? It's ExecPrepareTupleRouting() that does so, and that's called directly in ExecUpdate(). 4) /* * If this insert is the result of a partition key update that moved the * tuple to a new partition, put this row into the transition NEW TABLE, * if there is one. We need to do this separately for DELETE and INSERT * because they happen on different tables. */ ar_insert_trig_tcs = mtstate->mt_transition_capture; if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture && mtstate->mt_transition_capture->tcs_update_new_table) { ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot, NULL, mtstate->mt_transition_capture); /* * We've already captured the NEW TABLE row, so make sure any AR * INSERT trigger fired below doesn't capture it again. */ ar_insert_trig_tcs = NULL; } /* AFTER ROW INSERT Triggers */ ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes, ar_insert_trig_tcs); Besides not using the just defined ar_insert_trig_tcs and instead repeatedly referring to mtstate->mt_transition_capture, wouldn't this be a easier to understand if the were an if/else, instead of resetting ar_insert_trig_tcs? If the block were /* * triggers behave differently depending on this being a delete as * part of a partion move, or a deletion proper. if (mtstate->operation == CMD_UPDATE) { /* * If this insert is the result of a partition key update that moved the * tuple to a new partition, put this row into the transition NEW TABLE, * if there is one. We need to do this separately for DELETE and INSERT * because they happen on different tables. */ ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot, NULL, mtstate->mt_transition_capture); /* * But we do want to fire plain per-row INSERT triggers on the * new table. By not passing in transition_capture we prevent * .... */ ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes, NULL); } else { /* AFTER ROW INSERT Triggers */ ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes, ar_insert_trig_tcs); } it seems like it'd be quite a bit clearer (although I do think the comments also need a fair bit of polishing independent of this proposed change). Greetings, Andres Freund [1] https://www.postgresql.org/message-id/CAKJS1f-YObQJTbncGJGRZ6gSFiS%2Bgw_Y5kvrpR%3DvEnFKH17AVA%40mail.gmail.com