Excerpts from Noah Misch's message of dom dic 04 09:20:27 -0300 2011:

> > +    /*
> > +     * If the tuple we're updating is locked, we need to preserve this in 
> > the
> > +     * new tuple's Xmax as well as in the old tuple.  Prepare the new xmax
> > +     * value for these uses.
> > +     *
> > +     * Note there cannot be an xmax to save if we're changing key columns; 
> > in
> > +     * this case, the wait above should have only returned when the locking
> > +     * transactions finished.
> > +     */
> > +    if (TransactionIdIsValid(keep_xmax))
> > +    {
> > +        if (keep_xmax_multi)
> > +        {
> > +            keep_xmax_old = MultiXactIdExpand(keep_xmax,
> > +                                              xid, MultiXactStatusUpdate);
> > +            keep_xmax_infomask = HEAP_XMAX_KEYSHR_LOCK | 
> > HEAP_XMAX_IS_MULTI;
> 
> Not directly related to this line, but is the HEAP_IS_NOT_UPDATE bit getting
> cleared where needed?

AFAICS it's reset along the rest of the HEAP_LOCK_BITS when the tuple is
modified.

> > @@ -2725,11 +2884,20 @@ l2:
> >          oldtup.t_data->t_infomask &= ~(HEAP_XMAX_COMMITTED |
> >                                         HEAP_XMAX_INVALID |
> >                                         HEAP_XMAX_IS_MULTI |
> > -                                       HEAP_IS_LOCKED |
> > +                                       HEAP_LOCK_BITS |
> >                                         HEAP_MOVED);
> > +        oldtup.t_data->t_infomask2 &= ~HEAP_UPDATE_KEY_INTACT;
> >          HeapTupleClearHotUpdated(&oldtup);
> >          /* ... and store info about transaction updating this tuple */
> > -        HeapTupleHeaderSetXmax(oldtup.t_data, xid);
> > +        if (TransactionIdIsValid(keep_xmax_old))
> > +        {
> > +            HeapTupleHeaderSetXmax(oldtup.t_data, keep_xmax_old);
> > +            oldtup.t_data->t_infomask |= keep_xmax_old_infomask;
> > +        }
> > +        else
> > +            HeapTupleHeaderSetXmax(oldtup.t_data, xid);
> > +        if (key_intact)
> > +            oldtup.t_data->t_infomask2 |= HEAP_UPDATE_KEY_INTACT;
> >          HeapTupleHeaderSetCmax(oldtup.t_data, cid, iscombo);
> >          /* temporarily make it look not-updated */
> >          oldtup.t_data->t_ctid = oldtup.t_self;
> 
> Shortly after this, we release the content lock and go off toasting the tuple
> and finding free space.  When we come back, could the old tuple have
> accumulated additional KEY SHARE locks that we need to re-copy?

Yeah, I've been wondering about this: do we have a problem already with
exclusion constraints?  I mean, if a concurrent inserter doesn't see the
tuple that we've marked here as deleted while we toast it, it could
result in a violated constraint, right?  I haven't built a test case to
prove it.


> > @@ -3231,30 +3462,70 @@ l3:
> >      {
> >          TransactionId xwait;
> >          uint16        infomask;
> > +        uint16        infomask2;
> > +        bool        require_sleep;
> >  
> >          /* must copy state data before unlocking buffer */
> >          xwait = HeapTupleHeaderGetXmax(tuple->t_data);
> >          infomask = tuple->t_data->t_infomask;
> > +        infomask2 = tuple->t_data->t_infomask2;
> >  
> >          LockBuffer(*buffer, BUFFER_LOCK_UNLOCK);
> >  
> >          /*
> > -         * If we wish to acquire share lock, and the tuple is already
> > -         * share-locked by a multixact that includes any subtransaction of 
> > the
> > -         * current top transaction, then we effectively hold the desired 
> > lock
> > -         * already.  We *must* succeed without trying to take the tuple 
> > lock,
> > -         * else we will deadlock against anyone waiting to acquire 
> > exclusive
> > -         * lock.  We don't need to make any state changes in this case.
> > +         * If we wish to acquire share or key lock, and the tuple is 
> > already
> > +         * key or share locked by a multixact that includes any 
> > subtransaction
> > +         * of the current top transaction, then we effectively hold the 
> > desired
> > +         * lock already (except if we own key share lock and now desire 
> > share
> > +         * lock).  We *must* succeed without trying to take the tuple lock,
> 
> This can now apply to FOR UPDATE as well.
> 
> For the first sentence, I suggest the wording "If any subtransaction of the
> current top transaction already holds a stronger lock, we effectively hold the
> desired lock already."

I settled on this:

                /*
                 * If any subtransaction of the current top transaction already 
holds a
                 * lock as strong or stronger than what we're requesting, we
                 * effectively hold the desired lock already.  We *must* succeed
                 * without trying to take the tuple lock, else we will deadlock 
against
                 * anyone wanting to acquire a stronger lock.
                 */
                if (infomask & HEAP_XMAX_IS_MULTI)
                {
                        int             i;
                        int             nmembers;
                        MultiXactMember *members;
                        MultiXactStatus cutoff = 
get_mxact_status_for_tuplelock(mode);

                        nmembers = GetMultiXactIdMembers(xwait, &members);

                        for (i = 0; i < nmembers; i++)
                        {
                                if 
(TransactionIdIsCurrentTransactionId(members[i].xid))
                                {
                                        if (members[i].status >= cutoff)
                                        {
                                                if (have_tuple_lock)
                                                        
UnlockTupleTuplock(relation, tid, mode);
                
                                                pfree(members);
                                                return HeapTupleMayBeUpdated;
                                        }
                                }
                        }

                        pfree(members);
                }

Now, I can't see the reason that we didn't previously consider locks "as
strong as what we're requesting" ... but surely it's the same case?

> > +         * else we will deadlock against anyone wanting to acquire a 
> > stronger
> > +         * lock.
> 
> > +         *
> > +         * FIXME -- we don't do the below currently, but I think we should:
> > +         *
> > +         * We update the Xmax with a new MultiXactId to include the new 
> > lock
> > +         * mode in this case.
> > +         *
> > +         * Note that since we want to alter the Xmax, we need to 
> > re-acquire the
> > +         * buffer lock.  The xmax could have changed in the meantime, so we
> > +         * recheck it in that case, but we keep the buffer lock while 
> > doing it
> > +         * to prevent starvation.  The second time around we know we must 
> > be
> > +         * part of the MultiXactId in any case, which is why we don't need 
> > to
> > +         * go back to recheck HeapTupleSatisfiesUpdate.  Also, after we
> > +         * re-acquire lock, the MultiXact is likely to (but not 
> > necessarily) be
> > +         * the same that we see here, so it should be in multixact's cache 
> > and
> > +         * thus quick to obtain.
> 
> What is the benefit of doing so?

After thinking more about it, I think it's bogus.  I've removed it.


> Incidentally, why is this level of xlog detail needed for tuple locks?  We
> need an FPI of the page before the lock-related changes start scribbling on
> it, and we need to log any xid, even that of a locker, that could land in the
> heap on disk.  But, why do we actually need to replay each lock?

Uhm.  I remember thinking that a hot standby replica needed it ...

> > + * slightly incorrect, because lockers whose status did not conflict with 
> > ours
> > + * are not even considered and so might have gone away anyway.
> >   *
> >   * But by the time we finish sleeping, someone else may have changed the 
> > Xmax
> >   * of the containing tuple, so the caller needs to iterate on us somehow.
> >   */
> >  void
> > -MultiXactIdWait(MultiXactId multi)
> > +MultiXactIdWait(MultiXactId multi, MultiXactStatus status, int *remaining)
> 
> This function should probably move (with a new name) to heapam.c (or maybe
> lmgr.c, in part).  It's an abstraction violation to have multixact.c knowing
> about lock conflict tables.  multixact.c should be marshalling those two bits
> alongside each xid without any deep knowledge of their meaning.

Interesting thought.


> >      /*
> > -     * Also truncate MultiXactMember at the previously determined offset.
> > +     * FIXME there's a race condition here: somebody might have created a 
> > new
> > +     * segment after we finished scanning the dir.  That scenario would 
> > leave
> > +     * us with an invalid truncateXid in shared memory, which is not an 
> > easy
> > +     * situation to get out of.  Needs more thought.
> 
> Agreed.  Not sure.
> 
> Broadly, this feels like a lot of code to handle truncating the segments, but
> I don't know how to simplify it.

It is a lot of code.  And it took me quite a while to even figure out
how to do it.  I don't see any other way to go about it.


> > +        xmax = HeapTupleGetUpdateXid(tuple);
> > +        if (TransactionIdIsCurrentTransactionId(xmax))
> > +        {
> > +            if (HeapTupleHeaderGetCmax(tuple) >= snapshot->curcid)
> > +                return true;    /* deleted after scan started */
> > +            else
> > +                return false;    /* deleted before scan started */
> > +        }
> > +        if (TransactionIdIsInProgress(xmax))
> > +            return true;
> > +        if (TransactionIdDidCommit(xmax))
> > +        {
> > +            SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED, xmax);
> > +            /* updating transaction committed, but when? */
> > +            if (XidInMVCCSnapshot(xmax, snapshot))
> > +                return true;    /* treat as still in progress */
> > +            return false;
> > +        }
> 
> In both HEAP_XMAX_MULTI conditional blocks, you do not set HEAP_XMAX_INVALID
> for an aborted updater.  What is the new meaning of HEAP_XMAX_INVALID for
> multixacts?  What implications would arise if we instead had it mean that the
> updating xid is aborted?  That would allow us to get the mid-term performance
> benefit of the hint bit when the updating xid spills into a multixact, and it
> would reduce code duplication in this function.

Well, HEAP_XMAX_INVALID means the Xmax is not valid, period.  If there's
a multi whose updater is aborted, there's still a multi that needs to be
checked in various places, so we cannot set that bit.


> I did not review the other tqual.c changes.  Could you summarize how the
> changes to the other functions must differ from the changes to
> HeapTupleSatisfiesMVCC()?

I don't think they should differ in any significant way ... if they do,
it's probably bogus.  Only HeapTupleSatisfiesVacuum should differ
significantly, because it's a world on its own.


> > --- a/src/bin/pg_resetxlog/pg_resetxlog.c
> > +++ b/src/bin/pg_resetxlog/pg_resetxlog.c
> > @@ -332,6 +350,11 @@ main(int argc, char *argv[])
> >      if (set_mxoff != -1)
> >          ControlFile.checkPointCopy.nextMultiOffset = set_mxoff;
> >  
> > +    /*
> > +    if (set_mxfreeze != -1)
> > +        ControlFile.checkPointCopy.mxactFreezeXid = set_mxfreeze;
> > +        */
> > +
> >      if (minXlogTli > ControlFile.checkPointCopy.ThisTimeLineID)
> >          ControlFile.checkPointCopy.ThisTimeLineID = minXlogTli;
> >  
> > @@ -578,6 +601,10 @@ PrintControlValues(bool guessed)
> >             ControlFile.checkPointCopy.nextMulti);
> >      printf(_("Latest checkpoint's NextMultiOffset:  %u\n"),
> >             ControlFile.checkPointCopy.nextMultiOffset);
> > +    /*
> > +    printf(_("Latest checkpoint's MultiXact freezeXid: %u\n"),
> > +           ControlFile.checkPointCopy.mxactFreezeXid);
> > +           */
> 
> Should these changes be live?  They look reasonable at first glance.

Oh, I forgot about these.  Yeah, these need to be live, but not in the
exact for they have here; there were some tweaks I needed to do IIRC.

> >  /*
> > + * A tuple is only locked (i.e. not updated by its Xmax) if it the Xmax is 
> > not
> > + * a multixact and it has either the EXCL_LOCK or KEYSHR_LOCK bits set, or 
> > if
> > + * the xmax is a multi that doesn't contain an update.
> > + *
> > + * Beware of multiple evaluation of arguments.
> > + */
> > +#define HeapTupleHeaderInfomaskIsLocked(infomask) \
> > +    ((!((infomask) & HEAP_XMAX_IS_MULTI) && \
> > +      (infomask) & (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)) || \
> > +     (((infomask) & HEAP_XMAX_IS_MULTI) && ((infomask) & 
> > HEAP_XMAX_IS_NOT_UPDATE)))
> > +
> > +#define HeapTupleHeaderIsLocked(tup) \
> > +    HeapTupleHeaderInfomaskIsLocked((tup)->t_infomask)
> 
> I'm uneasy having a HeapTupleHeaderIsLocked() that returns false when a tuple
> is both updated and KEY SHARE-locked.  Perhaps HeapTupleHeaderIsUpdated() with
> the opposite meaning, or HeapTupleHeaderIsOnlyLocked()?

I had the IsOnlyLocked thought too.  I will go that route.

(I changed HeapTupleHeaderGetXmax to GetRawXmax, thanks for that
suggestion)

-- 
Álvaro Herrera <alvhe...@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to