Re: [HACKERS] Hot Standby: too many KnownAssignedXids

Heikki Linnakangas Wed, 24 Nov 2010 02:48:56 -0800

On 24.11.2010 06:56, Joachim Wieland wrote:

On Tue, Nov 23, 2010 at 8:45 AM, Heikki Linnakangas
<[email protected]>  wrote:

On 19.11.2010 23:46, Joachim Wieland wrote:


FATAL:  too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978,
pArray->maxKnownAssignedXids: 6890


Hmm, that's a lot of entries in KnownAssignedXids.

Can you recompile with WAL_DEBUG, and run the recovery again with
wal_debug=on ? That will print all the replayed WAL records, which is a lot
of data, but it might give a hint what's going on.


Sure, but this gives me only one more line:

[...]
LOG:  redo starts at 1F8/FC00E978
LOG:  REDO @ 1F8/FC00E978; LSN 1F8/FC00EE90: prev 1F8/FC00E930; xid
385669; len 21; bkpb1: Heap - insert: rel 1663/16384/18373; tid
3829898/23
FATAL:  too many KnownAssignedXids
CONTEXT:  xlog redo insert: rel 1663/16384/18373; tid 3829898/23
LOG:  startup process (PID 4587) exited with exit code 1
LOG:  terminating any other active server processes

Thanks, I can reproduce this now. This happens when you have a wide gapbetween the oldest still active xid and the latest xid.

When recovery starts, we fetch the oldestActiveXid from the checkpointrecord. Let's say that it's 100. We then start replaying WAL recordsfrom the Redo pointer, and the first record (heap insert in your case)contains an Xid that's much larger than 100, say 10000. We callRecordKnownAssignedXids() to make note that all xids between that rangeare in-progress, but there isn't enough room in the array for that.

We normally get away with a smallish array because the array is trimmedat commit and abort records, and the special xid-assignment record tohandle the case of a lot of subtransactions. We initialize the arrayfrom the running-xacts record that's written at a checkpoint. Thatmechanism fails in this case because the heap insert record is seenbefore the running-xacts record, causing all those xids in the range100-10000 to be considered in-progress. The running-xacts record thatcomes later would prune them, but we don't have enough slots to holdthem until that.

Hmm. I'm not sure off the top of my head how to fix that. Perhaps stashthe xids we see during WAL replay in private memory instead of puttingthem in the KnownAssignedXids array until we see the running-xacts record.


To reproduce this, I did this in the master:

postgres=# CREATE FUNCTION insertfunc(n integer) RETURNS VOID AS $$
declare
  i integer;
begin
  FOR i IN 1..n LOOP
    BEGIN
      INSERT INTO foo VALUES (1);
    EXCEPTION WHEN division_by_zero THEN RAISE NOTICE 'divbyzero';
    END;
  END LOOP;
end;
$$ LANGUAGE plpgsql;
postgres=# SELECT insertfunc(100000000);

After letting that run for a while, so that a couple of checkpoints haveoccurred, kill the master and start standby to recover that fromarchive. After it has replayed all the WAL, stop the standby and restart it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby: too many KnownAssignedXids

Reply via email to