Re: Inadequate executor locking of indexes

David Rowley Tue, 12 Mar 2019 18:39:23 -0700

On Tue, 19 Feb 2019 at 12:13, David Rowley <david.row...@2ndquadrant.com> wrote:
>
> On Tue, 12 Feb 2019 at 09:58, Julien Rouhaud <rjuju...@gmail.com> wrote:
> >
> > On Mon, Feb 11, 2019 at 5:32 AM David Rowley
> > <david.row...@2ndquadrant.com> wrote:
> > > 1. Adding a new field to RangeTblEntry to indicate the operation type
> > > that's being performed on the relation; or
> > > 2. Adding a Bitmapset field to PlannerGlobal that sets the rtable
> > > indexes of RangeTblEntry items that belong to DELETEs and ignore these
> > > when setting resultRelids in finalize_lockmodes().
> > >
> > > For #2, the only place I can see to do this is
> > > add_rtes_to_flat_rtable(), which would require either passing the
> > > PlannerInfo into the function, or at least its parse's commandType.
> > >
> > > I don't really like either, but don't have any other ideas at the moment.
> >
> > But we would still need the same lock level upgrade logic on indexes
> > for cases like CTE with a mix of INSERT, UPDATE and DELETE on the same
> > relation I think.  #1 seems like a better solution to me.
>
> I think I'd rather find some way to do it that didn't denormalise the
> parse nodes like that.  It seems very strange to have a CmdType in the
> Query struct, and then another set of them in RangeTblEntry. Besides
> bloating the size of the RangeTblEntry struct a bit, it also could
> lead to inconsistency bugs where the two CmdTypes differ, for some
> reason.


I had another go at this patch and fixed the problem by just setting
the idxlockmode inside the planner just before the call to
expand_inherited_tables().  This allows the lockmode to be copied into
child RTEs.  Doing it later in planning is more of a problem since we
don't really store all the result relations in PlannerInfo, we only
store the one that was written in the query.  The others are stored in
the ModifyTable path and then later in the ModifyTable plan.  The only
other place I could see to do it (without adding an extra field in
PlannerInfo) was during createplan... which does not at all seem like
the right place, and is also too late for get_relation_info(), see
below.

The finalize_lockmodes() now does the same thing for idxlockmode as it
does for rellockmode, i.e, just set the strongest lock for each unique
rel oid.

However, during my work on this, I saw a few things that made me
wonder if all this is over-engineered and worth the trouble.  The
first thing I saw was that in get_relation_info() we obtain a
RowExclusiveLock if the relation is the result relation. This means we
obtain a RowExclusiveLock for DELETE targets too.  This is
inconsistent with what the executor tries to do and results in us
taking a weaker lock during execution than we do during planning.  It
also means we don't bump into the lock in the local lock table which
results in slower locking. That problem would be exaggerated with
partitioned tables with a large number of partitions or inheritance
parents with lots of children, however, likely that'll be drowned out
by all the other slow stuff around there.

Another thing that's on my mind is that in the patch in
nodeModifyTable.c we still do:

if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
operation != CMD_DELETE &&
resultRelInfo->ri_IndexRelationDescs == NULL)
ExecOpenIndices(resultRelInfo,
node->onConflictAction != ONCONFLICT_NONE);

i.e don't open the indexes for DELETEs.  I had ideas that maybe this
could be changed to check the idxlockmode and open the indexes if it's
above AccessSharedLock.  There didn't seem to be a very nice way to
fetch the RangeTblEntry from the ResultRelInfo though, so I didn't do
that, but leaving it as is does not really work well with the patch as
I really want the planner be the thing that decides what happens here.


My final thoughts were that I see a lot of work going on to implement
pluggable storage.  I ended up having an offlist conversation with
Thomas Munro about zheap and what its requirements were for locking
indexes during deletes. It seems modifying the index during delete is
not something that happens with the current version, but is planned
for future versions.  In any case, we can't really assume zheap is
going to be the only additional storage engine implementation that's
going to get hooked into this.

Current thoughts are, do we:

1. Allow the storage engine to communicate its locking needs via an
API function call and add code to check that in the
determine_index_lockmode function (new in the attached patch)
2. Just get rid of the "operation != CMD_DELETE &&" line in the above
code and just always lock indexes for DELETE targets in
RowExclusiveLock mode.

#2 would not address Tom's mention of there possibly being some way to
have the index scan node initialise before the modify table node
(currently we pass NoLock for indexes belonging to result rels in the
index scan nodes).  I can't quite imagine at the moment how that's
possible, but maybe my imagination is just not good enough.  We could
fix that by passing RowExclusiveLock instead of NoLock. It just means
we'd discover the lock already exists in the local lock table...
unless of course there is a case where the index scan gets initialised
before modify table is.

I'm adding Andres, Robert, Thomas, Amit and Haribabu due to the
involvement with pluggable storage and zheap.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

idxlockmode_and_lock_upgrade_fix_v3.patch
Description: Binary data

Re: Inadequate executor locking of indexes

Reply via email to