Re: [BUGS] BUG #8294: new timeline 6 forked off current dat abase system timeline 5 before current recovery point 0/100000

Heikki Linnakangas Mon, 19 Aug 2013 00:25:50 -0700

(Quoted pg_controldata output edited to highlight the important parts)


On 11.07.2013 06:26, [email protected] wrote:

PostgreSQL 9.3 beta2 stream replication primary and standby cann't
switchover.
...


Primary :

psql
checkpont;
pg_controldata
...
Database cluster state:               in production
Latest checkpoint location:           0/C000060
Prior checkpoint location:            0/B000060
Latest checkpoint's REDO location:    0/C000028
Latest checkpoint's TimeLineID:       4
Latest checkpoint's PrevTimeLineID:   4
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0

Standby :
Database cluster state:               in archive recovery
Latest checkpoint location:           0/B000060
Prior checkpoint location:            0/B000060
Latest checkpoint's REDO location:    0/B000028
Latest checkpoint's TimeLineID:       4
Latest checkpoint's PrevTimeLineID:   4
Minimum recovery ending location:     0/B0000F0
Min recovery ending loc's timeline:   4

So at this point you have a primary, and a standby server following theprimary through streaming replication.

Primary :
pg_ctl stop
waiting for server to shut down.... done
server stopped
pg_controldata
Database cluster state:               shut down
Latest checkpoint location:           0/D000028
Prior checkpoint location:            0/C000060
Latest checkpoint's REDO location:    0/D000028
Latest checkpoint's TimeLineID:       4
Latest checkpoint's PrevTimeLineID:   4
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0

standby :
pg_controldata
Database cluster state:               in archive recovery
Latest checkpoint location:           0/B000060
Prior checkpoint location:            0/B000060
Latest checkpoint's REDO location:    0/B000028
Latest checkpoint's TimeLineID:       4
Latest checkpoint's PrevTimeLineID:   4
Minimum recovery ending location:     0/B0000F0
Min recovery ending loc's timeline:   4

pg_ctl promote
server promoting

pg_controldata
Database cluster state:               in production
Latest checkpoint location:           0/D000090
Prior checkpoint location:            0/B000060
Latest checkpoint's REDO location:    0/D000058
Latest checkpoint's TimeLineID:       5
Latest checkpoint's PrevTimeLineID:   5


The primary is shut down, and the standby is promoted to become new master.

primary  :
  cd $PGDATA
  mv recovery.done recovery.conf
  pg_ctl start
  log :
2013-07-11 11:10:11.386 CST,,,14911,,51de2213.3a3f,1,,2013-07-11 11:10:11
CST,,0,LOG,00000,"database system was shut down in recovery at 2013-07-11
11:09:51 CST",,,,,,,,"StartupXLOG, xlog.c:4895",""
2013-07-11 11:10:11.387 CST,,,14911,,51de2213.3a3f,2,,2013-07-11 11:10:11
CST,,0,LOG,00000,"entering standby mode",,,,,,,,"StartupXLOG,
xlog.c:4968",""
2013-07-11 11:10:11.391 CST,,,14911,,51de2213.3a3f,3,,2013-07-11 11:10:11
CST,1/0,0,LOG,00000,"consistent recovery state reached at
0/D000090",,,,,,,,"CheckRecoveryConsistency, xlog.c:6187",""
2013-07-11 11:10:11.391 CST,,,14911,,51de2213.3a3f,4,,2013-07-11 11:10:11
CST,1/0,0,LOG,00000,"record with zero length at
0/D000090",,,,,,,,"ReadRecord, xlog.c:3285",""
2013-07-11 11:10:11.392 CST,,,14909,,51de2213.3a3d,1,,2013-07-11 11:10:11
CST,,0,LOG,00000,"database system is ready to accept read only
connections",,,,,,,,"sigusr1_handler, postmaster.c:4658",""
2013-07-11 11:10:11.407 CST,,,14915,,51de2213.3a43,1,,2013-07-11 11:10:11
CST,,0,LOG,00000,"fetching timeline history file for timeline 5 from primary
server",,,,,,,,"WalRcvFetchTimeLineHistoryFiles, walreceiver.c:666",""
2013-07-11 11:10:11.411 CST,,,14915,,51de2213.3a43,2,,2013-07-11 11:10:11
CST,,0,LOG,00000,"primary server contains no more WAL on requested timeline
4",,,,,,,,"WalReceiverMain, walreceiver.c:529",""
2013-07-11 11:10:11.411 CST,,,14911,,51de2213.3a3f,5,,2013-07-11 11:10:11
CST,1/0,0,LOG,00000,"new timeline 5 forked off current database system
timeline 4 before current recovery point
0/D000090",,,,,,,,"rescanLatestTimeLine, xlog.c:3441",""

The old primary is restarted in standby-mode. It tries to connect to thenew primary, but it refuses to follow it because there is some WALapplied on the old primary that was not replicated to the new primarybefore the switchover.

I believe this is the same issue that Fujii reported in June:http://www.postgresql.org/message-id/CAHGQGwHLjEROTMtSWJd=xg_vfwre3ojwntysybdubrya6rr...@mail.gmail.com.This was fixed in commit 0b958f3efcfcc3d9b0e39d550b705a28763bc9e2 onJune 25th:http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0b958f3efcfcc3d9b0e39d550b705a28763bc9e2

Unfortunately that commit didn't make it into 9.3beta2, which waswrapped just one day before that commit. Could you try again with afresh checkout from REL9_3_STABLE branch? Or if you can't easily buildfrom sources, you can wait for the 9.3rc1 release, which should beavailable later this week(http://www.postgresql.org/message-id/[email protected]).


Thanks for the report! Please let us know if the next version fixes this.

- Heikki


--
Sent via pgsql-bugs mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #8294: new timeline 6 forked off current dat abase system timeline 5 before current recovery point 0/100000

Reply via email to