Re: PostgreSql: Canceled on conflict out to old pivot

Heikki Linnakangas Tue, 28 Nov 2023 00:54:20 -0800

On 28/11/2023 07:41, Wirch, Eduard wrote:

ERROR: could not serialize access due to read/write dependencies amongtransactions Detail: Reason code: Canceled on identification as a pivot, withconflict out to old committed transaction 61866959.
There is a variation of the error:
PSQLException: ERROR: could not serialize access due to read/writedependencies among transactions
   Detail: Reason code: Canceled on conflict out to old pivot 61940806.

Both of these errors are coming from CheckForSerializableConflictOut(),and are indeed variations of the same kind of conflict.

We're logging the id, begin and end of every transaction. Transaction61940806 was committed without errors. The transaction responsible forthe above error was started 40min later (and failed immediately). With61866959 it is even more extreme: the first conflict error occurred 2.5hafter 61866959 was committed.

Weird indeed. There is only one caller ofCheckForSerializableConflictOut(), and it does this:

        /*
         * Find top level xid.  Bail out if xid is too early to be a conflict, 
or
         * if it's our own xid.
         */
        if (TransactionIdEquals(xid, GetTopTransactionIdIfAny()))
                return;
        xid = SubTransGetTopmostTransaction(xid);
        if (TransactionIdPrecedes(xid, TransactionXmin))
                return;

        CheckForSerializableConflictOut(relation, xid, snapshot);

That check with TransactionXmin is very clear: if 'xid' precedes thexmin of the current transaction, IOW if there were no transactions with'xid' or older running when the current transcaction started,CheckForSerializableConflictOut() is not called.

The DB table access pattern is too complex to lay out here. There arelike 20 tables that are read/written to. Transactions are usually shortliving. The longest transaction that could occur is 1 min long. Myunderstanding of serializable isolation is that only overlappingtransactions can conflict. I can be pretty sure that in the above casesthere is no single transaction, which overlaps with 61940806 and withthe failing transaction 40 min later.

I hate to drill on this, but are you very sure about that? I don't seehow this could happen if there are no long-running transactions. Maybe aforgotten two-phase commit transaction? A transaction in a differentdatabase? A developer who did "begin;" in psql and went for lunch?

Such long running transactionswould cause different types of errors in our system ("out of sharedmemory", "You might need to increase max_pred_locks_per_transaction").

I don't see why that would necessarily be the case, unless it'ssomething very specific to your application.

Why does PostgreSql detect a conflict with a transaction which wascommitted more than 1h before? Can there be a long dependency chainbetween many short running transactions? Does the high load preventPostgres from doing some clean up?

The dependencies don't chain like that, but there is a system of"summarizing" old transactions to limit the shared memory usage. When atransaction has dependencies on other transactions, we track thosedependencies in shared memory. But if we run short on the space reservedfor that, we summarize the dependencies, losing granularity. We loseinformation of which relations/pages/tuples the xid accessed and whichtransactions exactly it had a dependency on. That is safe, but can causefalse positives.

The amount of shared memory reserved for tracking the dependencies isdetermined by max_pred_locks_per_transaction, so you could tryincreasing that to reduce those false positives, even if you never getthe "out of shared memory" error.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: PostgreSql: Canceled on conflict out to old pivot

Reply via email to