On Fri, Dec 15, 2017 at 7:58 AM, Robert Haas <robertmh...@gmail.com> wrote: > Reviewing the preparatory patch:
I started another review pass over the main patch, so here are some comments about that. This is unfortunately not a complete review, however. - map = ptr->partition_tupconv_maps[leaf_part_index]; + map = ptr->parentchild_tupconv_maps[leaf_part_index]; I don't think there's any reason to rename this. In previous patch versions, you had multiple arrays of tuple conversion maps in this structure, but the refactoring eliminated that. Likewise, I'm not sure I get the point of mt_transition_tupconv_maps -> mt_childparent_tupconv_maps. That seems like it could similarly be left alone. + /* + * If transition tables are the only reason we're here, return. As + * mentioned above, we can also be here during update tuple routing in + * presence of transition tables, in which case this function is called + * separately for oldtup and newtup, so either can be NULL, not both. + */ if (trigdesc == NULL || (event == TRIGGER_EVENT_DELETE && !trigdesc->trig_delete_after_row) || (event == TRIGGER_EVENT_INSERT && !trigdesc->trig_insert_after_row) || - (event == TRIGGER_EVENT_UPDATE && !trigdesc->trig_update_after_row)) + (event == TRIGGER_EVENT_UPDATE && !trigdesc->trig_update_after_row) || + (event == TRIGGER_EVENT_UPDATE && ((oldtup == NULL) ^ (newtup == NULL)))) I guess this is correct, but it seems awfully fragile. Can't we have some more explicit signaling about whether we're only here for transition tables, rather than deducing it based on exactly one of oldtup and newtup being NULL? + /* Initialization specific to update */ + if (mtstate && mtstate->operation == CMD_UPDATE) + { + ModifyTable *node = (ModifyTable *) mtstate->ps.plan; + + is_update = true; + update_rri = mtstate->resultRelInfo; + num_update_rri = list_length(node->plans); + } I guess I don't see why we need a separate "if" block for this. Neither is_update nor update_rri nor num_update_rri are used until we get to the block that begins with "if (is_update)". Why not just change that block to test if (mtstate && mtstate->operation == CMD_UPDATE)" and put the rest of these initializations inside that block? + int num_update_rri = 0, + update_rri_index = 0; ... + update_rri_index = 0; It's already 0. + leaf_part_rri = &update_rri[update_rri_index]; ... + leaf_part_rri = leaf_part_arr + i; These are doing the same kind of thing, but using different styles. I prefer the former style, so I'd change the second one to &leaf_part_arr[i]. Alternatively, you could change the first one to update_rri + update_rri_indx. But it's strange to see the same variable initialized in two different ways just a few lines apart. + if (!partrel) + { + /* + * We locked all the partitions above including the leaf + * partitions. Note that each of the newly opened relations in + * *partitions are eventually closed by the caller. + */ + partrel = heap_open(leaf_oid, NoLock); + InitResultRelInfo(leaf_part_rri, + partrel, + resultRTindex, + rel, + estate->es_instrument); + } Hmm, isn't there a problem here? Before, we opened all the relations here and the caller closed them all. But now, we're only opening some of them. If the caller closes them all, then they will be closing some that we opened and some that we didn't. That seems quite bad, because the reference counts that are incremented and decremented by opening and closing should all end up at 0. Maybe I'm confused because it seems like this would break in any scenario where even 1 relation was already opened and surely you must have tested that case... but if there's some reason this works, I don't know what it is, and the comment doesn't tell me. +static HeapTuple +ConvertPartitionTupleSlot(ModifyTableState *mtstate, + TupleConversionMap *map, + HeapTuple tuple, + TupleTableSlot *new_slot, + TupleTableSlot **p_my_slot) This function doesn't use the mtstate argument at all. + * (Similarly we need to add the deleted row in OLD TABLE). We need to do The period should be before, not after, the closing parenthesis. + * Now that we have already captured NEW TABLE row, any AR INSERT + * trigger should not again capture it below. Arrange for the same. A more American style would be something like "We've already captured the NEW TABLE row, so make sure any AR INSERT trigger fired below doesn't capture it again." (Similarly for the other case.) + /* The delete has actually happened, so inform that to the caller */ + if (tuple_deleted) + *tuple_deleted = true; In the US, we inform the caller, not inform that to the caller. In other words, here the direct object of "inform" is the person or thing getting the information (in this case, "the caller"), not the information being conveyed (in this case, "that"). I realize your usage is probably typical for your country... + Assert(mtstate->mt_is_tupconv_perpart == true); We usually just Assert(thing_that_should_be_true), not Assert(thing_that_should_be_true == true). + * In case this is part of update tuple routing, put this row into the + * transition OLD TABLE if we are capturing transition tables. We need to + * do this separately for DELETE and INSERT because they happen on + * different tables. Maybe "...OLD table, but only if we are..." Should it be capturing transition tables or capturing transition tuples? I'm not sure. + * partition, in which case, we should check the RLS CHECK policy just In the US, the second comma in this sentence is incorrect and should be removed. + * When an UPDATE is run with a leaf partition, we would not have + * partition tuple routing setup. In that case, fail with run with -> run on would not -> will not setup -> set up + * deleted by another transaction), then we should skip INSERT as + * well, otherwise, there will be effectively one new row inserted. skip INSERT -> skip the insert well, otherwise -> well; otherwise I would also change "there will be effectively one new row inserted" to "an UPDATE could cause an increase in the total number of rows across all partitions, which is clearly wrong". + /* + * UPDATEs set the transition capture map only when a new subplan + * is chosen. But for INSERTs, it is set for each row. So after + * INSERT, we need to revert back to the map created for UPDATE; + * otherwise the next UPDATE will incorrectly use the one created + * for INESRT. So first save the one created for UPDATE. + */ + if (mtstate->mt_transition_capture) + saved_tcs_map = mtstate->mt_transition_capture->tcs_map; UPDATEs -> Updates INESRT -> INSERT I wonder if there is some more elegant way to handle this problem. Basically, the issue is that ExecInsert() is stomping on mtstate->mt_transition_capture, and your solution is to save and restore the value you want to have there. But maybe we could instead find a way to get ExecInsert() not to stomp on that state in the first place. It seems like the ON CONFLICT stuff handled that by adding a second TransitionCaptureState pointer to ModifyTable, thus mt_transition_capture and mt_oc_transition_capture. By that precedent, we could add mt_utr_transition_capture or similar, and maybe that's the way to go. It seems a bit unsatisfying, but so does what you have now. + * 2. For capturing transition tables that are partitions. For UPDATEs, we need This isn't worded well. A transition table is never a partition; transition tables and partitions are two different kinds of things. + * If per-leaf map is required and the map is already created, that map + * has to be per-leaf. If that map is per-subplan, we won't be able to + * access the maps leaf-partition-wise. But if the map is per-leaf, we + * will be able to access the maps subplan-wise using the + * subplan_partition_offsets map using function + * tupconv_map_for_subplan(). So if the callers might need to access + * the map both leaf-partition-wise and subplan-wise, they should make + * sure that the first time this function is called, it should be + * called with perleaf=true so that the map created is per-leaf, not + * per-subplan. This sounds complicated and fragile. It ends up meaning that mt_childparent_tupconv_maps is sometimes indexed by subplan number and sometimes by partition leaf index, which is extremely confusing and likely to lead to coding errors, either in this patch or in future ones. Would it be reasonable to just always do this by partition leaf index, even if we don't strictly need that set of mappings? That's all I've got for now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company