On Tue, 19 Feb 2019 at 12:13, David Rowley <david.row...@2ndquadrant.com> wrote: > > On Tue, 12 Feb 2019 at 09:58, Julien Rouhaud <rjuju...@gmail.com> wrote: > > > > On Mon, Feb 11, 2019 at 5:32 AM David Rowley > > <david.row...@2ndquadrant.com> wrote: > > > 1. Adding a new field to RangeTblEntry to indicate the operation type > > > that's being performed on the relation; or > > > 2. Adding a Bitmapset field to PlannerGlobal that sets the rtable > > > indexes of RangeTblEntry items that belong to DELETEs and ignore these > > > when setting resultRelids in finalize_lockmodes(). > > > > > > For #2, the only place I can see to do this is > > > add_rtes_to_flat_rtable(), which would require either passing the > > > PlannerInfo into the function, or at least its parse's commandType. > > > > > > I don't really like either, but don't have any other ideas at the moment. > > > > But we would still need the same lock level upgrade logic on indexes > > for cases like CTE with a mix of INSERT, UPDATE and DELETE on the same > > relation I think. #1 seems like a better solution to me. > > I think I'd rather find some way to do it that didn't denormalise the > parse nodes like that. It seems very strange to have a CmdType in the > Query struct, and then another set of them in RangeTblEntry. Besides > bloating the size of the RangeTblEntry struct a bit, it also could > lead to inconsistency bugs where the two CmdTypes differ, for some > reason.
I had another go at this patch and fixed the problem by just setting the idxlockmode inside the planner just before the call to expand_inherited_tables(). This allows the lockmode to be copied into child RTEs. Doing it later in planning is more of a problem since we don't really store all the result relations in PlannerInfo, we only store the one that was written in the query. The others are stored in the ModifyTable path and then later in the ModifyTable plan. The only other place I could see to do it (without adding an extra field in PlannerInfo) was during createplan... which does not at all seem like the right place, and is also too late for get_relation_info(), see below. The finalize_lockmodes() now does the same thing for idxlockmode as it does for rellockmode, i.e, just set the strongest lock for each unique rel oid. However, during my work on this, I saw a few things that made me wonder if all this is over-engineered and worth the trouble. The first thing I saw was that in get_relation_info() we obtain a RowExclusiveLock if the relation is the result relation. This means we obtain a RowExclusiveLock for DELETE targets too. This is inconsistent with what the executor tries to do and results in us taking a weaker lock during execution than we do during planning. It also means we don't bump into the lock in the local lock table which results in slower locking. That problem would be exaggerated with partitioned tables with a large number of partitions or inheritance parents with lots of children, however, likely that'll be drowned out by all the other slow stuff around there. Another thing that's on my mind is that in the patch in nodeModifyTable.c we still do: if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex && operation != CMD_DELETE && resultRelInfo->ri_IndexRelationDescs == NULL) ExecOpenIndices(resultRelInfo, node->onConflictAction != ONCONFLICT_NONE); i.e don't open the indexes for DELETEs. I had ideas that maybe this could be changed to check the idxlockmode and open the indexes if it's above AccessSharedLock. There didn't seem to be a very nice way to fetch the RangeTblEntry from the ResultRelInfo though, so I didn't do that, but leaving it as is does not really work well with the patch as I really want the planner be the thing that decides what happens here. My final thoughts were that I see a lot of work going on to implement pluggable storage. I ended up having an offlist conversation with Thomas Munro about zheap and what its requirements were for locking indexes during deletes. It seems modifying the index during delete is not something that happens with the current version, but is planned for future versions. In any case, we can't really assume zheap is going to be the only additional storage engine implementation that's going to get hooked into this. Current thoughts are, do we: 1. Allow the storage engine to communicate its locking needs via an API function call and add code to check that in the determine_index_lockmode function (new in the attached patch) 2. Just get rid of the "operation != CMD_DELETE &&" line in the above code and just always lock indexes for DELETE targets in RowExclusiveLock mode. #2 would not address Tom's mention of there possibly being some way to have the index scan node initialise before the modify table node (currently we pass NoLock for indexes belonging to result rels in the index scan nodes). I can't quite imagine at the moment how that's possible, but maybe my imagination is just not good enough. We could fix that by passing RowExclusiveLock instead of NoLock. It just means we'd discover the lock already exists in the local lock table... unless of course there is a case where the index scan gets initialised before modify table is. I'm adding Andres, Robert, Thomas, Amit and Haribabu due to the involvement with pluggable storage and zheap. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
idxlockmode_and_lock_upgrade_fix_v3.patch
Description: Binary data