On 08.10.2018 18:24, Andres Freund wrote:
On October 8, 2018 2:04:28 AM PDT, Konstantin Knizhnik
<k.knizh...@postgrespro.ru> wrote:
On 05.10.2018 11:04, Michael Paquier wrote:
On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote:
As you can notice, XID 2004495308 is encountered twice which cause
error in
KnownAssignedXidsAdd:
if (head > tail &&
TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1],
from_xid))
{
KnownAssignedXidsDisplay(LOG);
elog(ERROR, "out-of-order XID insertion in
KnownAssignedXids");
}
The probability of this error is very small but it can quite easily
reproduced: you should just set breakpoint in debugger after calling
MarkAsPrepared in twophase.c and then try to prepare any
transaction.
MarkAsPrepared will add GXACT to proc array and at this moment
there will
be two entries in procarray with the same XID:
[snip]
Now generated RUNNING_XACTS record contains duplicated XIDs.
So, I have been doing exactly that, and if you trigger a manual
checkpoint then things happen quite correctly if you let the first
session finish:
rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn:
0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608
latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606
If you still maintain the debugger after calling MarkAsPrepared, then
the manual checkpoint would block. Now if you actually keep the
debugger, and wait for a checkpoint timeout to happen, then I can see
the incorrect record. It is impressive that your customer has been
able
to see that first, and then that you have been able to get into that
state with simple steps.
I want to ask opinion of community about the best way of fixing this
problem. Should we avoid storing duplicated XIDs in procarray (by
invalidating XID in original pgaxct) or eliminate/change check for
duplicate in KnownAssignedXidsAdd (for example just ignore
duplicates)?
Hmmmmm... Please let me think through that first. It seems to me
that
the record should not be generated to begin with. At least I am able
to
confirm what you see.
The simplest way to fix the problem is to ignore duplicates before
adding them to KnownAssignedXids.
We in any case perform sort i this place...
I vehemently object to that as the proper course.
And what about adding qsort to GetRunningTransactionData or
LogCurrentRunningXacts and excluding duplicates here?
Andres
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company