Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

Alvaro Herrera Wed, 11 Dec 2013 17:10:07 -0800

Andres Freund wrote:
> On 2013-12-11 14:00:05 -0300, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > On 2013-12-09 19:14:58 -0300, Alvaro Herrera wrote:


> > > > Andres mentioned the idea of sharing some code between
> > > > heap_prepare_freeze_tuple and heap_tuple_needs_freeze, but I haven't
> > > > explored that.
> > > 
> > > My idea would just be to have heap_tuple_needs_freeze() call
> > > heap_prepare_freeze_tuple() and check whether it returns true. Yes,
> > > that's slightly more expensive than the current
> > > heap_tuple_needs_freeze(), but it's only called when we couldn't get a
> > > cleanup lock on a page, so that seems ok.
> > 
> > Doesn't seem a completely bad idea, but let's leave it for a separate
> > patch.  This should be changed in master only IMV anyway, while the rest
> > of this patch is to be backpatched to 9.3.
> 
> I am not so sure it shouldn't be backpatched together with this. We now
> have similar complex logic in both functions.

Any other opinions on this?


> > > > !               if (ISUPDATE_from_mxstatus(members[i].status) &&
> > > > !                       !TransactionIdDidAbort(members[i].xid))#
> > > 
> > > It makes me wary to see a DidAbort() without a previous InProgress()
> > > call.  Also, after we crashed, doesn't DidAbort() possibly return
> > > false for transactions that were in progress before we crashed? At
> > > least that's how I always understood it, and how tqual.c is written.
> > 
> > Yes, that's correct.  But note that here we're not doing a tuple
> > liveliness test, which is what tqual.c is doing.  What we do with this
> > info is to keep the Xid as part of the multi if it's still running or
> > committed.  We also keep it if the xact crashed, which is fine because
> > the Xid will be removed by some later step.  If we know for certain that
> > the update Xid is aborted, then we can ignore it, but this is just an
> > optimization and not needed for correctness.
> 
> But why deviate that way? It doesn't seem to save us much?

Well, it does save something -- there not being a live update means we
are likely to be able to invalidate the Xmax completely if there are no
lockers; and even in the case where there are lockers, we will be able
to set LOCK_ONLY which means faster access in several places.


> > > >                         if (tuple->t_infomask & HEAP_MOVED_OFF)
> > > > !                               frz->frzflags |= XLH_FREEZE_XVAC;
> > > >                         else
> > > > !                               frz->frzflags |= XLH_INVALID_XVAC;
> > > >   
> > > 
> > > Hm. Isn't this case inverted? I.e. shouldn't you set XLH_FREEZE_XVAC and
> > > XLH_INVALID_XVAC exactly the other way round? I really don't understand
> > > the moved in/off, since the code has been gone longer than I've followed
> > > the code...
> > 
> > Yep, fixed.
> 
> I wonder how many of the HEAP_MOVED_* cases around are actually
> correct... What was the last version those were generated? 8.4?

8.4, yeah, before VACUUM FULL got rewritten.  I don't think anybody
tests these code paths, because it involves databases that were upgraded
straight from 8.4 and which in their 8.4 time saw VACUUM FULL executed.

I think we should be considering removing these things, or at least have
some mechanism to ensure they don't survive from pre-9.0 installs.

> > (I was toying with the "desc"
> > code because it misbehaves when applied on records as they are created,
> > as opposed to being applied on records as they are replayed.  I'm pretty
> > sure everyone already knows about this, and it's the reason why
> > everybody has skimped from examining arrays of things stored in followup
> > data records.  I was naive enough to write code that tries to decode the
> > followup record that contains the members of the multiaxct we're
> > creating, which works fine during replay but gets them completely wrong
> > during regular operation.  This is the third time I'm surprised by this
> > misbehavior; blame my bad memory for not remembering that it's not
> > supposed to work in the first place.)
> 
> I am not really sure what you are talking about. That you cannot
> properly decode records before they have been processed by XLogInsert()?
> If so, yes, that's pretty clear and I am pretty sure it will break in
> lots of places if you try?

Well, not sure about "lots of places".  The only misbahavior I have seen
is in those desc routines.  Of course, the redo routines might also
fail, but then there's no way for them to be running ...

> > Right now there is one case in this code that returns
> > FRM_INVALIDATE_XMAX when it's not strictly necessary, i.e. it would also
> > work to keep the Multi as is and return FRM_NOOP instead; and it also
> > returns FRM_NOOP in one case when we could return FRM_INVALIDATE_XMAX
> > instead.  Neither does any great damage, but there is a consideration
> > that future examiners of the tuple would have to resolve the MultiXact
> > by themselves (==> performance hit).  On the other hand, returning
> > INVALIDATE causes the block to be dirtied, which is undesirable if not
> > already dirty.
> 
> Otherwise it will be marked dirty the next time reads the page, so I
> don't think this is problematic.

Not necessarily.  I mean, if somebody sees a multi, they might just
resolve it to its members and otherwise leave the page alone.  Or in
some cases not even resolve to members (if it's LOCK_ONLY and old enough
to be behind the oldest visible multi).

> > !   {
> > !           if (ISUPDATE_from_mxstatus(members[i].status))
> > !           {
> > !                   /*
> > !                    * It's an update; should we keep it?  If the 
> > transaction is known
> > !                    * aborted then it's okay to ignore it, otherwise not.  
> > (Note this
> > !                    * is just an optimization and not needed for 
> > correctness, so it's
> > !                    * okay to get this test wrong; for example, in case an 
> > updater is
> > !                    * crashed, or a running transaction in the process of 
> > aborting.)
> > !                    */
> > !                   if (!TransactionIdDidAbort(members[i].xid))
> > !                   {
> > !                           newmembers[nnewmembers++] = members[i];
> > !                           Assert(!TransactionIdIsValid(update_xid));
> > ! 
> > !                           /*
> > !                            * Tell caller to set HEAP_XMAX_COMMITTED hint 
> > while we have
> > !                            * the Xid in cache.  Again, this is just an 
> > optimization, so
> > !                            * it's not a problem if the transaction is 
> > still running and
> > !                            * in the process of committing.
> > !                            */
> > !                           if (TransactionIdDidCommit(update_xid))
> > !                                   update_committed = true;
> > ! 
> > !                           update_xid = newmembers[i].xid;
> > !                   }
> 
> I don't think the conclusions here are correct - we might be setting
> HEAP_XMAX_COMMITTED a smudge to early that way. If the updating
> transaction is in progress, there's the situation that we have updated
> the clog, but have not yet removed ourselves from the procarray. I.e. a
> situation in which TransactionIdIsInProgress() and
> TransactionIdDidCommit() both return true. Afaik it is only correct to
> set HEAP_XMAX_COMMITTED once TransactionIdIsInProgress() returns false.

Hmm ... Is there an actual difference?  I mean, a transaction that
marked itself as committed in pg_clog cannot return to any other state,
regardless of what happens elsewhere.

> > !                   /*
> > !                    * Checking for very old update Xids is critical: if 
> > the update
> > !                    * member of the multi is older than cutoff_xid, we 
> > must remove
> > !                    * it, because otherwise a later liveliness check could 
> > attempt
> > !                    * pg_clog access for a page that was truncated away by 
> > the
> > !                    * current vacuum.      Note that if the update had 
> > committed, we
> > !                    * wouldn't be freezing this tuple because it would 
> > have gotten
> > !                    * removed (HEAPTUPLE_DEAD) by 
> > HeapTupleSatisfiesVacuum; so it
> > !                    * either aborted or crashed.  Therefore, ignore this 
> > update_xid.
> > !                    */
> > !                   if (TransactionIdPrecedes(update_xid, cutoff_xid))
> > !                   {
> > !                           update_xid = InvalidTransactionId;
> > !                           update_committed = false;
> 
> I vote for an Assert(!TransactionIdDidCommit(update_xid)) here.

Will add.

> > !   else
> > !   {
> > !           /*
> > !            * Create a new multixact with the surviving members of the 
> > previous
> > !            * one, to set as new Xmax in the tuple.
> > !            *
> > !            * If this is the first possibly-multixact-able operation in the
> > !            * current transaction, set my per-backend OldestMemberMXactId
> > !            * setting. We can be certain that the transaction will never 
> > become a
> > !            * member of any older MultiXactIds than that.
> > !            */
> > !           MultiXactIdSetOldestMember();
> > !           xid = MultiXactIdCreateFromMembers(nnewmembers, newmembers);
> > !           *flags |= FRM_RETURN_IS_MULTI;
> > !   }
> 
> I worry that this MultiXactIdSetOldestMember() will be problematic in
> longrunning vacuums like a anti-wraparound vacuum of a huge
> table. There's no real need to set MultiXactIdSetOldestMember() here,
> since we will not become the member of a multi. So I think you should
> either move the Assert() in MultiXactIdCreateFromMembers() to it's other
> callers, or add a parameter to skip it.

I would like to have the Assert() work automatically, that is, check the
PROC_IN_VACUUM flag in MyProc->vacuumflags ... but this probably won't
work with CLUSTER.  That said, I think we *should* call SetOldestMember
in CLUSTER.  So maybe both things should be conditional on
PROC_IN_VACUUM.

(Either way it will be ugly.)

> > ! /*
> > !  * heap_prepare_freeze_tuple
> >    *
> >    * Check to see whether any of the XID fields of a tuple (xmin, xmax, 
> > xvac)
> > !  * are older than the specified cutoff XID and cutoff MultiXactId.        
> > If so,
> > !  * setup enough state (in the *frz output argument) to later execute and
> > !  * WAL-log what we would need to do, and return TRUE.  Return FALSE if 
> > nothing
> > !  * is to be changed.
> > !  *
> > !  * Caller is responsible for setting the offset field, if appropriate.    
> > This
> > !  * is only necessary if the freeze is to be WAL-logged.
> 
> I'd leave of that second sentence, if you want to freeze a whole page
> but not WAL log it, you'd need to set offset as well...

I can buy that.


> I worry that all these multixact accesses will create huge performance
> problems due to the inefficiency of the multixactid cache. If you scan a
> huge table there very well might be millions of different multis we
> touch and afaics most of them will end up in the multixactid cache. That
> can't end well.
> I think we need to either regularly delete that cache when it goes past,
> say, 100 entries, or just bypass it entirely.

Delete the whole cache, or just prune it of the least recently used
entries?  Maybe the cache should be a dlist instead of the open-coded
stuff that's there now; that would enable pruning of the oldest entries.
I think a blanket deletion might be a cure worse than the disease.  I
see your point anyhow.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

Reply via email to