Re: [BUGS] BUG #4854: Problems with replaying WAL files on Warm Standby

Keith Pierno Tue, 16 Jun 2009 06:21:22 -0700

The timeline for the events all dates MM/DD/YYYY

    06/09/2009 1310 EDT - Hardware fault on primary database server db01pri
    06/09/2009 1325 EDT - Failover to warm standby db01sec
    06/12/2009 1615 EDT - db01pri server fixed and OS booted
    06/15/2009 1115 EDT - started recovery of hotbackup from 06/15/2009 0205 EDT from db01sec onto db01pri
    06/15/2009 1320 EDT - Attempted to start postgres on db01pri in warm standby mode
    06/15/2009 1325 EDT - Failure to apply WAL log errors with "unexpected timeline ID"
    06/15/2009 1340 EDT - Started a new hotbackup on db01sec
    06/15/2009 1545 EDT - Started recovery hotbackup from 06/15/2009 1340 to db01pri
    06/15/2000 1430 EDT - db01pri recovered and running in warm standby

Here is the contents of the pg_xlog directory and the 00000004.history file:

[postg...@db01pri ~]$ cat 00000004.history
1    0000000100000736000000A1    before transaction 0 at 1999-12-31 19:00:00-05
[postg...@db01pri ~]$ ls -l
total 98468
-rw------- 1 postgres postgres       74 Jul 10 2008 00000002.history
-rw------- 1 postgres postgres       74 Jun 9 13:29 00000003.history
-rw------- 1 postgres postgres 16777216 Jun 16 08:45 0000000400000749000000C9
-rw------- 1 postgres postgres 16777216 Jun 16 08:46 0000000400000749000000CA
-rw------- 1 postgres postgres 16777216 Jun 16 08:47 0000000400000749000000CB
-rw------- 1 postgres postgres       74 Jun 9 13:33 00000004.history
drwxr-xr-x 2 postgres postgres    32768 Jun 16 08:46 archive_status
-rw------- 1 postgres postgres 16777216 Jun 9 13:45 xlogtemp.17243
-rw------- 1 postgres postgres 16777216 Jun 9 13:45 xlogtemp.17244
-rw------- 1 postgres postgres 16777216 Jun 9 13:52 xlogtemp.17397
[postg...@db01pri ~]$

Thanks again,

Keith

Tom Lane wrote:

Keith Pierno <[email protected]> writes:

The backup used was from well after the failover time which is why I
was concerned. Interestingly enough the logs are still all prefixed
with 00000004... That just makes this problem extremely bizarre.


Hmm, that *is* weird.  It seems like the new primary must have reverted
its decision to go from timeline 4 to timeline 6.  (Which in itself is
a bit odd; why not timeline 5?)


Can you give us an exact sequence of events on the slave server/new
primary around the time of the failover?  Also, what was in the .history
file when you found it, and are there any other history files?

			regards, tom lane

Re: [BUGS] BUG #4854: Problems with replaying WAL files on Warm Standby

Reply via email to