Master/slave replication, the master and slave both have full_page_writes 
disabled because the system is running on ZFS.

While the master was still being setup and data was being loaded, the system 
ran out of open file descriptors and Pg crashed a few times until this was 
realized and corrected. Streaming replication was setup and was correctly 
passing data to the slave:

repmgr@db01=# select * FROM pg_stat_replication ;
  pid  | usesysid | usename | application_name |   client_addr   | 
client_hostname | client_port |         backend_start         |   state   | 
sent_location | write_location | flush_location | replay_location | 
sync_priority | sync_state 
-------+----------+---------+------------------+-----------------+-----------------+-------------+-------------------------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
 68057 |    16384 | repmgr  | walreceiver      | 192.168.100.183 |              
   |       60743 | 2012-10-06 16:41:36.963769+00 | streaming | 9E/E2C81210   | 
9E/E2C81210    | 9E/E2C81210    | 9E/E2C80C68     |             0 | async
(1 row)

Once the host was up and running, I kicked off a manual VACUUM, at which point 
I see a bunch of the following errors on the master:

INFO:  vacuuming "schma.tbl"
WARNING:  relation "tbl" page 65305 is uninitialized --- fixing
WARNING:  relation "tbl" page 70927 is uninitialized --- fixing
WARNING:  relation "tbl" page 75286 is uninitialized --- fixing
WARNING:  relation "tbl" page 86601 is uninitialized --- fixing
WARNING:  relation "tbl" page 91434 is uninitialized --- fixing
WARNING:  relation "tbl" page 91435 is uninitialized --- fixing
WARNING:  relation "tbl" page 95639 is uninitialized --- fixing
WARNING:  relation "tbl" page 105638 is uninitialized --- fixing
WARNING:  relation "tbl" page 114550 is uninitialized --- fixing
WARNING:  relation "tbl" page 118380 is uninitialized --- fixing
WARNING:  relation "tbl" page 121478 is uninitialized --- fixing
WARNING:  relation "tbl" page 122098 is uninitialized --- fixing
WARNING:  relation "tbl" page 127195 is uninitialized --- fixing
WARNING:  relation "tbl" page 152416 is uninitialized --- fixing
WARNING:  relation "tbl" page 230192 is uninitialized --- fixing
WARNING:  relation "tbl" page 241091 is uninitialized --- fixing
INFO:  scanned index "tbl_pkey" to remove 447 row versions

and on the slave I see:

Oct  6 17:04:02 db02 postgres[4535]: [14-1] @ 4535 0: WARNING:  page 2943 of 
relation base/16387/20196 is uninitialized
Oct  6 17:04:02 db02 postgres[4535]: [14-2] @ 4535 0: CONTEXT:  xlog redo 
vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403
Oct  6 17:04:02 db02 postgres[4535]: [15-1] @ 4535 0: PANIC:  WAL contains 
references to invalid pages
Oct  6 17:04:02 db02 postgres[4535]: [15-2] @ 4535 0: CONTEXT:  xlog redo 
vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403
Oct  6 17:04:02 db02 postgres[4529]: [10-1] @ 4529 0: LOG:  startup process 
(PID 4535) was terminated by signal 6: Abort trap
Oct  6 17:04:02 db02 postgres[4529]: [11-1] @ 4529 0: LOG:  terminating any 
other active server processes
Oct  6 17:04:02 db02 postgres[8345]: [26-1] pgsql@repmgr 8345 0: WARNING:  
terminating connection because of crash of another server process
Oct  6 17:04:02 db02 postgres[8345]: [26-2] pgsql@repmgr 8345 0: DETAIL:  The 
postmaster has commanded this server process to roll back the current 
transaction and exit, because another server process exited abnormally and 
possibly corrupted shared memory.
Oct  6 17:04:02 db02 postgres[8345]: [26-3] pgsql@repmgr 8345 0: HINT:  In a 
moment you should be able to reconnect to the database and repeat your command.

Restarting the slave, I see:

Oct  6 17:11:52 db02 postgres[15191]: [12-1] @ 15191 0: LOG:  consistent 
recovery state reached at 9E/E3DDE2B0
Oct  6 17:11:52 db02 postgres[15185]: [9-1] @ 15185 0: LOG:  database system is 
ready to accept read only connections
Oct  6 17:12:03 db02 postgres[15191]: [13-1] @ 15191 0: WARNING:  page 2943 of 
relation base/16387/20196 is uninitialized
Oct  6 17:12:03 db02 postgres[15191]: [13-2] @ 15191 0: CONTEXT:  xlog redo 
vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403
Oct  6 17:12:03 db02 postgres[15191]: [14-1] @ 15191 0: PANIC:  WAL contains 
references to invalid pages
Oct  6 17:12:03 db02 postgres[15191]: [14-2] @ 15191 0: CONTEXT:  xlog redo 
vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403
Oct  6 17:12:03 db02 postgres[15185]: [10-1] @ 15185 0: LOG:  startup process 
(PID 15191) was terminated by signal 6: Abort trap
Oct  6 17:12:03 db02 postgres[15185]: [11-1] @ 15185 0: LOG:  terminating any 
other active server processes

At this point in time, I'm suspecting full_page_writes=off is the culprit and 
it's violating logic in several places. I haven't tracked things down further, 
but I thought I'd at least bring this to people's attention so that the 
appropriate warnings can be put in place. The slave can not be recovered with 
the corrupted WAL entries, however, and they need to be resynced with 
full_page_writes enabled (at least that's my assumption at present):

Oct  6 17:26:34 db02 postgres[19614]: [12-1] @ 19614 0: LOG:  consistent 
recovery state reached at 9E/E3DDE2B0
Oct  6 17:26:34 db02 postgres[19608]: [11-1] @ 19608 0: LOG:  database system 
is ready to accept read only connections
Oct  6 17:26:44 db02 postgres[19614]: [13-1] @ 19614 0: WARNING:  page 2943 of 
relation base/16387/20196 is uninitialized
Oct  6 17:26:44 db02 postgres[19614]: [13-2] @ 19614 0: CONTEXT:  xlog redo 
vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403
Oct  6 17:26:44 db02 postgres[19614]: [14-1] @ 19614 0: PANIC:  WAL contains 
references to invalid pages
Oct  6 17:26:44 db02 postgres[19614]: [14-2] @ 19614 0: CONTEXT:  xlog redo 
vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403
Oct  6 17:26:44 db02 postgres[19608]: [12-1] @ 19608 0: LOG:  startup process 
(PID 19614) was terminated by signal 6: Abort trap
Oct  6 17:26:44 db02 postgres[19608]: [13-1] @ 19608 0: LOG:  terminating any 
other active server processes


Cheers. I'll report back if I dig anything else up. -sc


[ plz CC as I'm not subscribed to the list ]]


--
Sean Chittenden
s...@chittenden.org



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to