> > Caveat #2 applies here > > http://developer.postgresql.org/pgdocs/postgres/hot-standby.html#HOT-STANDBY-CAVEATS > > The consistent state is delayed until your long running transactions > end, which is workload dependent but transient. >
I'm not quite sure how this correlates to what I'm seeing in 9.1.1. When attempting to start a hotstandby while the primary is under load it seems to get stuck in that 'starting up' mode even when there are no open transactions. If I get it into this state, and then remove all the load from the primary it still will not finish starting up. If I select from pg_stat_activity on the primary it shows a couple connections, but they all have null 'xact_start's and <IDLE> 'current_query's. Even if I then kill all the connections to the primary (via pg_terminate_backend) the hotstandby still will not finish starting up. I would assume there can't be any transactions in progress if there are no connections to the primary. Attempting to restart the hotstandby when in this state produces the same result. If there is a transaction in progress for the duration of the backup (or something like that) can it cause it to get into this state? Here is a debug3 log of this startup from 9.1.1: 2011-10-26 11:25:32.931 MDT [22640]: [1-1] LOG: database system was shut down in recovery at 2011-10-26 11:25:02 MDT 2011-10-26 11:25:32.931 MDT [22640]: [2-1] DEBUG: standby_mode = 'on' 2011-10-26 11:25:32.931 MDT [22640]: [3-1] DEBUG: primary_conninfo = 'host=othernode port=5432 user=postgres application_name=sync_rep_test' 2011-10-26 11:25:32.931 MDT [22640]: [4-1] DEBUG: trigger_file = '/db/data/trigger_file' 2011-10-26 11:25:32.931 MDT [22640]: [5-1] LOG: entering standby mode 2011-10-26 11:25:32.931 MDT [22640]: [6-1] DEBUG: checkpoint record is at 7/96667628 2011-10-26 11:25:32.931 MDT [22640]: [7-1] DEBUG: redo record is at 7/960000A8; shutdown FALSE 2011-10-26 11:25:32.931 MDT [22640]: [8-1] DEBUG: next transaction ID: 0/11855637; next OID: 3219844 2011-10-26 11:25:32.931 MDT [22640]: [9-1] DEBUG: next MultiXactId: 1895; next MultiXactOffset: 3810 2011-10-26 11:25:32.931 MDT [22640]: [10-1] DEBUG: oldest unfrozen transaction ID: 1669, in database 1 2011-10-26 11:25:32.931 MDT [22640]: [11-1] DEBUG: transaction ID wrap limit is 2147485316, limited by database with OID 1 2011-10-26 11:25:32.932 MDT [22640]: [12-1] DEBUG: resetting unlogged relations: cleanup 1 init 0 2011-10-26 11:25:32.939 MDT [22640]: [13-1] DEBUG: initializing for hot standby 2011-10-26 11:25:32.939 MDT [22640]: [14-1] LOG: redo starts at 7/960000A8 2011-10-26 11:25:33.027 MDT [22640]: [15-1] DEBUG: running transaction data initialized 2011-10-26 11:25:33.027 MDT [22640]: [16-1] CONTEXT: xlog redo running xacts: nextXid 11855637 latestCompletedXid 11855636 oldestRunningXid 11855250; 195 xacts: 11855359 11855360 ... 11855444 11855445; subxid ovf 2011-10-26 11:25:33.028 MDT [22640]: [17-1] DEBUG: 0 KnownAssignedXids (num=0 tail=0 head=0) 2011-10-26 11:25:33.028 MDT [22640]: [18-1] CONTEXT: xlog redo running xacts: nextXid 11855637 latestCompletedXid 11855636 oldestRunningXid 11855250; 195 xacts: 11855359 11855360 ... 11855444 11855445; subxid ovf 2011-10-26 11:25:33.028 MDT [22640]: [19-1] LOG: consistent state delayed because recovery snapshot incomplete 2011-10-26 11:25:33.028 MDT [22640]: [20-1] CONTEXT: xlog redo running xacts: nextXid 11855637 latestCompletedXid 11855636 oldestRunningXid 11855250; 195 xacts: 11855359 11855360 ... 11855444 11855445; subxid ovf 2011-10-26 11:25:33.514 MDT [22640]: [21-1] LOG: consistent recovery state reached at 7/98480E28 2011-10-26 11:25:33.515 MDT [22640]: [22-1] LOG: record with zero length at 7/98480E28 2011-10-26 11:25:33.515 MDT [22640]: [23-1] DEBUG: could not open file "pg_xlog/000000010000000700000098" (log file 7, segment 152): No such file or directory 2011-10-26 11:25:33.515 MDT [22642]: [1-1] DEBUG: find_in_dynamic_libpath: trying "/usr/local/postgresql-9.1.1/lib/libpqwalreceiver" 2011-10-26 11:25:33.515 MDT [22642]: [2-1] DEBUG: find_in_dynamic_libpath: trying "/usr/local/postgresql-9.1.1/lib/libpqwalreceiver.so" DEBUG: received replication command: IDENTIFY_SYSTEM DEBUG: received replication command: START_REPLICATION 7/98000000 2011-10-26 11:25:33.519 MDT [22642]: [3-1] LOG: streaming replication successfully connected to primary 2011-10-26 11:25:33.519 MDT [22642]: [4-1] DEBUG: sending write 0/0 flush 0/0 apply 7/98480E28 2011-10-26 11:25:33.522 MDT [22642]: [5-1] DEBUG: sending write 7/98020000 flush 0/0 apply 7/98480E28 2011-10-26 11:25:33.568 MDT [22642]: [6-1] DEBUG: sending write 7/98020000 flush 7/98020000 apply 7/98480E28 2011-10-26 11:25:33.569 MDT [22642]: [7-1] DEBUG: sending write 7/98040000 flush 7/98020000 apply 7/98480E28 ... 2011-10-26 11:25:33.924 MDT [22642]: [49-1] DEBUG: sending write 7/98460000 flush 7/98400000 apply 7/98480E28 2011-10-26 11:25:33.943 MDT [22642]: [50-1] DEBUG: sending write 7/98460000 flush 7/98460000 apply 7/98480E28 DEBUG: standby "sync_rep_test" has now caught up with primary 2011-10-26 11:25:33.943 MDT [22642]: [51-1] DEBUG: sending write 7/98480E28 flush 7/98460000 apply 7/98480E28 2011-10-26 11:25:33.959 MDT [22642]: [52-1] DEBUG: sending write 7/98480E28 flush 7/98480E28 apply 7/98480E28 2011-10-26 11:25:34.192 MDT [22634]: [33-1] DEBUG: forked new backend, pid=22644 socket=7 2011-10-26 11:25:34.192 MDT [22644]: [1-1] FATAL: the database system is starting up 2011-10-26 11:25:34.192 MDT [22644]: [2-1] DEBUG: shmem_exit(1): 0 callbacks to make 2011-10-26 11:25:34.192 MDT [22644]: [3-1] DEBUG: proc_exit(1): 1 callbacks to make 2011-10-26 11:25:34.192 MDT [22644]: [4-1] DEBUG: exit(1) 2011-10-26 11:25:34.192 MDT [22644]: [5-1] DEBUG: shmem_exit(-1): 0 callbacks to make 2011-10-26 11:25:34.192 MDT [22644]: [6-1] DEBUG: proc_exit(-1): 0 callbacks to make 2011-10-26 11:25:34.192 MDT [22634]: [34-1] DEBUG: server process (PID 22644) exited with exit code 1 2011-10-26 11:25:35.171 MDT [22634]: [35-1] DEBUG: forked new backend, pid=22655 socket=7 2011-10-26 11:25:35.171 MDT [22655]: [1-1] FATAL: the database system is starting up 2011-10-26 11:25:35.171 MDT [22655]: [2-1] DEBUG: shmem_exit(1): 0 callbacks to make 2011-10-26 11:25:35.171 MDT [22655]: [3-1] DEBUG: proc_exit(1): 1 callbacks to make 2011-10-26 11:25:35.171 MDT [22655]: [4-1] DEBUG: exit(1) 2011-10-26 11:25:35.171 MDT [22655]: [5-1] DEBUG: shmem_exit(-1): 0 callbacks to make 2011-10-26 11:25:35.171 MDT [22655]: [6-1] DEBUG: proc_exit(-1): 0 callbacks to make 2011-10-26 11:25:35.172 MDT [22634]: [36-1] DEBUG: server process (PID 22655) exited with exit code 1 and so on...