Re: Race condition in TransactionIdIsInProgress

Andres Freund Sat, 12 Feb 2022 13:50:59 -0800

Hi,

On 2022-02-12 13:25:58 +0000, Simon Riggs wrote:
> On Fri, 11 Feb 2022 at 19:08, Andres Freund <and...@anarazel.de> wrote:
> 
> > On 2022-02-11 13:48:59 +0000, Simon Riggs wrote:
> > > On Fri, 11 Feb 2022 at 08:48, Simon Riggs <simon.ri...@enterprisedb.com> 
> > > wrote:
> > > >
> > > > On Fri, 11 Feb 2022 at 06:11, Andres Freund <and...@anarazel.de> wrote:
> > > >
> > > > > Looks lik syncrep will make this a lot worse, because it can 
> > > > > drastically
> > > > > increase the window between the TransactionIdCommitTree() and
> > > > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  
> > > > > But at
> > > > > least it might make it easier to write tests exercising this 
> > > > > scenario...
> > > >
> > > > Agreed
> > > >
> > > > TransactionIdIsKnownCompleted(xid) is only broken because the single
> > > > item cache is set too early in some cases. The single item cache is
> > > > important for performance, so we just need to be more careful about
> > > > setting the cache.
> > >
> > > Something like this... fix_cachedFetchXid.v1.patch prevents the cache
> > > being set, but this fails! Not worked out why, yet.
> >
> > I don't think it's safe to check !TransactionIdKnownNotInProgress() in
> > TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to
> > do TransactionLogFetch() internally.
> 
> That's not correct because you're confusing
> TransactionIdKnownNotInProgress(), which is a new function introduced
> by the patch, with the existing function
> TransactionIdIsKnownCompleted().


I don't think so. This call of the new TransactionIdKnownNotInProgress()

> --- a/src/backend/access/transam/transam.c
> +++ b/src/backend/access/transam/transam.c
> @@ -73,6 +73,14 @@ TransactionLogFetch(TransactionId transactionId)
>               return TRANSACTION_STATUS_ABORTED;
>       }
>  
> +     /*
> +      * Safeguard that we have called TransactionIsIdInProgress() before
> +      * checking commit log manager, to ensure that we do not cache the
> +      * result until the xid is no longer in the procarray at eoxact.
> +      */
> +     if (!TransactionIdKnownNotInProgress(transactionId))
> +             return TRANSACTION_STATUS_IN_PROGRESS;
> +
>       /*
>        * Get the transaction status.
>        */

isn't right. Consider what happens to TransactionIdIsInProgress(x)'s call to
TransactionIdDidAbort(x) at "step 4". TransactionIdDidAbort(x) will call
TransactionLogFetch(x). cachedXidIsNotInProgress isn't yet set to x, so
TransactionLogFetch() returns TRANSACTION_STATUS_IN_PROGRESS. Even if the
(sub)transaction aborted.

Which explains why your first patch doesn't work.


> > > just_remove_TransactionIdIsKnownCompleted_call.v1.patch
> >
> > I think this might contain a different diff than what the name suggests?
> 
> Not at all, please don't be confused by the name. The patch removes
> the call to TransactionIdIsKnownCompleted() from
> TransactionIdIsInProgress().

I guess for me "just remove" doesn't include adding a new cache...


> I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
> in backbranches.

I think it'd be fine if we needed to. Or we could just make it always return
false or such.


> > > just removes the known offending call, passes make check, but IMHO
> > > leaves the same error just as likely by other callers.
> >
> > Which other callers are you referring to?
> 
> I don't know of any, but we can't just remove a function in a
> backbranch, can we?

We have in the past, if it's a "sufficiently internal" function.


Greetings,

Andres Freund

Re: Race condition in TransactionIdIsInProgress

Reply via email to