> So that explains the crashes, but it doesn't (directly) explain why you
> had data corruption.

The first crash message:

Oct  5 15:00:21 db01 postgres[75852]: [6406-1] failjava@dbcluster 75852 0: 
ERROR:  could not seek to end of file "base/16387/18770": Too many open files 
in system
Oct  5 15:00:22 db01 postgres[75852]: [6413-1] failjava@dbcluster 75852 0: 
ERROR:  could not seek to end of file "base/16387/20259": Too many open files 
in system
Oct  5 15:00:23 db01 postgres[75852]: [6424-1] failjava@dbcluster 75852 0: 
ERROR:  could not seek to end of file "base/16387/20218": Too many open files 
in system
Oct  5 15:00:23 db01 postgres[75852]: [6431-1] failjava@dbcluster 75852 0: 
ERROR:  could not seek to end of file "base/16387/20218": Too many open files 
in system
Oct  5 15:00:24 db01 postgres[75852]: [6442-1] failjava@dbcluster 75852 0: 
ERROR:  could not seek to end of file "base/16387/20218": Too many open files 
in system
Oct  5 15:00:24 db01 postgres[75852]: [6449-1] failjava@dbcluster 75852 0: 
ERROR:  could not open file "base/16387/20302": Too many open files in system
Oct  5 15:00:24 db01 postgres[75852]: [6456-1] failjava@dbcluster 75852 0: 
ERROR:  could not seek to end of file "base/16387/20013": Too many open files 
in system
Oct  5 15:00:24 db01 postgres[75852]: [6463-1] failjava@dbcluster 75852 0: 
ERROR:  could not seek to end of file "base/16387/20259": Too many open files 
in system
Oct  5 15:00:24 db01 postgres[75852]: [6470-1] failjava@dbcluster 75852 0: 
ERROR:  could not seek to end of file "base/16387/20259": Too many open files 
in system
Oct  5 15:00:24 db01 postgres[75852]: [6480-1] failjava@dbcluster 75852 0: 
ERROR:  could not open file "base/16387/20275": Too many open files in system
Oct  5 15:00:25 db01 postgres[75852]: [6481-1] failjava@dbcluster 75852 0: 
ERROR:  current transaction is aborted, commands ignored until end of 
transaction block
Oct  5 15:00:25 db01 postgres[76648]: [5944-1] failjava@dbcluster 76648 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:00:25 db01 postgres[76649]: [5944-1] failjava@dbcluster 76649 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:00:25 db01 postgres[76407]: [5894-1] failjava@dbcluster 76407 0: 
WARNING:  terminating connection because of crash of another server process

For the sake of completeness, I looked to correlate the above relations with 
the tables that had problems, and there is zero correlation between seek(2) 
errors and tables that had VACUUM reported invalid pages.

     nspname     |     relname     | relnamespace | relfilenode | relkind |  
oid  
-----------------+-----------------+--------------+-------------+---------+-------
 failjava_master | tbl_F           |        16388 |       16402 | r       | 
16402
 failjava_db1    | tbl_D           |        16417 |       18473 | r       | 
18473
 failjava_db1    | tbl_E           |        16417 |       20179 | r       | 
20179
 failjava_db1    | tbl_C           |        16417 |       20187 | r       | 
20187
 failjava_db1    | tbl_B           |        16417 |       20200 | r       | 
20200
 failjava_db1    | tbl_A           |        16417 |       41731 | r       | 
19533


But I'm wondering if it died a second time before the system had finished its 
recovery:

Oct  5 15:01:31 db01 postgres[76888]: [6503-1] @ 76888 0: WARNING:  could not 
open statistics file "pg_stat_tmp/pgstat.stat": Too many open files in system
Oct  5 15:01:31 db01 postgres[77571]: [5978-1] @ 77571 0: FATAL:  pipe() 
failed: Too many open files in system
Oct  5 15:01:31 db01 postgres[77572]: [5979-1] failjava@dbcluster 77572 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:01:31 db01 postgres[77572]: [5980-1] failjava@dbcluster 77572 0: 
WARNING:  terminating connection because of crash of another server process
Oct  5 15:01:31 db01 postgres[77544]: [7490-1] failjava@dbcluster 77544 0: 
WARNING:  terminating connection because of crash of another server process

Unfortunately the logs that I have for the first set of crashes are limited to 
WARNING+ and don't include INFO log levels (though I have it later on. Due to 
various IRL constraints - hardware failure of the intended host - the 
provisioning of the master was a bit of a cannonball dash).

> I think the uninitialized pages are showing up because you had crashes
> in the midst of relation-extension operations, ie, some other backend
> had successfully done an smgrextend but hadn't yet laid down any valid
> data in the new page.  However, this theory would not explain more than
> one uninitialized page per crash, and your previous message seems to
> show rather a lot of uninitialized pages.  How many pipe-failed crashes
> did you have?

Assuming ~1min per recovery, it looks like around 38.

Oct  5 15:00:25 db01 postgres[76648]: [5944-1] failjava@dbcluster 76648 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:00:25 db01 postgres[76649]: [5944-1] failjava@dbcluster 76649 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:01:31 db01 postgres[77571]: [5978-1] @ 77571 0: FATAL:  pipe() 
failed: Too many open files in system
Oct  5 15:01:31 db01 postgres[77572]: [5979-1] failjava@dbcluster 77572 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:02:20 db01 postgres[78473]: [5987-1] failjava@dbcluster 78473 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:03:47 db01 postgres[79408]: [6005-1] failjava@dbcluster 79408 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:05:06 db01 postgres[80440]: [6112-1] failjava@dbcluster 80440 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:06:09 db01 postgres[81276]: [6117-1] @ 81276 0: FATAL:  pipe() 
failed: Too many open files in system
Oct  5 15:07:04 db01 postgres[82102]: [6131-1] failjava@dbcluster 82102 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:08:19 db01 postgres[82336]: [6136-1] @ 82336 0: FATAL:  could not 
open archive status directory "pg_xlog/archive_status": Too many open files in 
system
Oct  5 15:08:20 db01 postgres[83084]: [6161-1] @ 83084 0: FATAL:  pipe() 
failed: Too many open files in system
Oct  5 15:08:22 db01 postgres[83091]: [6184-1] failjava@dbcluster 83091 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:09:36 db01 postgres[84096]: [6218-1] failjava@dbcluster 84096 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:10:50 db01 postgres[85064]: [6226-1] failjava@dbcluster 85064 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:12:05 db01 postgres[86106]: [6233-1] failjava@dbcluster 86106 0: 
FATAL:  could not open relation mapping file "global/pg_filenode.map": Too many 
open files in system
Oct  5 15:12:05 db01 postgres[86107]: [6233-1] failjava@dbcluster 86107 0: 
FATAL:  could not open relation mapping file "global/pg_filenode.map": Too many 
open files in system
Oct  5 15:12:05 db01 postgres[86108]: [6233-1] failjava@dbcluster 86108 0: 
FATAL:  could not open relation mapping file "global/pg_filenode.map": Too many 
open files in system
Oct  5 15:12:05 db01 postgres[86109]: [6233-1] failjava@dbcluster 86109 0: 
FATAL:  could not open relation mapping file "global/pg_filenode.map": Too many 
open files in system
Oct  5 15:12:05 db01 postgres[86105]: [6234-1] failjava@dbcluster 86105 0: 
FATAL:  could not open file "base/16387/PG_VERSION": Too many open files in 
system
Oct  5 15:12:08 db01 postgres[86119]: [6252-1] failjava@dbcluster 86119 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:13:36 db01 postgres[87126]: [6295-1] failjava@dbcluster 87126 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:14:23 db01 postgres[88122]: [6364-1] failjava@dbcluster 88122 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:15:31 db01 postgres[88370]: [6369-1] @ 88370 0: FATAL:  could not 
open archive status directory "pg_xlog/archive_status": Too many open files in 
system
Oct  5 15:15:31 db01 postgres[89057]: [6398-1] @ 89057 0: FATAL:  pipe() 
failed: Too many open files in system
Oct  5 15:15:34 db01 postgres[89067]: [6426-1] failjava@dbcluster 89067 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:16:38 db01 postgres[90080]: [6430-1] @ 90080 0: FATAL:  pipe() 
failed: Too many open files in system
Oct  5 15:17:19 db01 postgres[90941]: [6446-1] failjava@dbcluster 90941 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:18:43 db01 postgres[91898]: [6484-1] failjava@dbcluster 91898 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:20:02 db01 postgres[93068]: [6514-1] failjava@dbcluster 93068 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:22:03 db01 postgres[94275]: [6519-1] failjava@dbcluster 94275 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:22:03 db01 postgres[94276]: [6519-1] failjava@dbcluster 94276 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:23:39 db01 postgres[95392]: [6525-1] failjava@dbcluster 95392 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:24:52 db01 postgres[96371]: [6533-1] failjava@dbcluster 96371 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:26:22 db01 postgres[97337]: [6570-1] failjava@dbcluster 97337 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:27:48 db01 postgres[98419]: [6607-1] failjava@dbcluster 98419 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:29:02 db01 postgres[99379]: [6706-1] failjava@dbcluster 99379 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:30:20 db01 postgres[467]: [6719-1] failjava@dbcluster 467 0: FATAL:  
pipe() failed: Too many open files in system
Oct  5 15:32:08 db01 postgres[1479]: [6769-1] failjava@dbcluster 1479 0: FATAL: 
 pipe() failed: Too many open files in system
Oct  5 15:33:28 db01 postgres[2714]: [6776-1] failjava@dbcluster 2714 0: FATAL: 
 pipe() failed: Too many open files in system
Oct  5 15:34:18 db01 postgres[3600]: [6870-1] failjava@dbcluster 3600 0: FATAL: 
 pipe() failed: Too many open files in system
Oct  5 15:36:07 db01 postgres[4608]: [6896-1] failjava@dbcluster 4608 0: FATAL: 
 pipe() failed: Too many open files in system
Oct  5 15:37:18 db01 postgres[5652]: [6944-1] failjava@dbcluster 5652 0: FATAL: 
 pipe() failed: Too many open files in system
Oct  5 15:38:56 db01 postgres[6640]: [7018-1] failjava@dbcluster 6640 0: FATAL: 
 pipe() failed: Too many open files in system
Oct  5 15:39:46 db01 postgres[7680]: [7025-1] failjava@dbcluster 7680 0: FATAL: 
 pipe() failed: Too many open files in system
Oct  5 15:41:08 db01 postgres[8609]: [7081-1] failjava@dbcluster 8609 0: FATAL: 
 pipe() failed: Too many open files in system
Oct  5 15:42:20 db01 postgres[9546]: [7095-1] failjava@dbcluster 9546 0: FATAL: 
 pipe() failed: Too many open files in system
Oct  5 15:43:53 db01 postgres[10468]: [7127-1] failjava@dbcluster 10468 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:45:32 db01 postgres[11634]: [7159-1] failjava@dbcluster 11634 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:47:10 db01 postgres[12721]: [7184-1] failjava@dbcluster 12721 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:48:47 db01 postgres[13831]: [7194-1] failjava@dbcluster 13831 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:49:47 db01 postgres[14912]: [7217-1] failjava@dbcluster 14912 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:51:32 db01 postgres[15928]: [7223-1] failjava@dbcluster 15928 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:52:34 db01 postgres[16954]: [7231-1] failjava@dbcluster 16954 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:53:49 db01 postgres[17963]: [7247-1] failjava@dbcluster 17963 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:55:36 db01 postgres[19309]: [7269-1] failjava@dbcluster 19309 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:57:08 db01 postgres[20273]: [7286-1] failjava@dbcluster 20273 0: 
FATAL:  pipe() failed: Too many open files in system
Oct  5 15:58:34 db01 postgres[20616]: [7291-1] @ 20616 0: FATAL:  could not 
open archive status directory "pg_xlog/archive_status": Too many open files in 
system
Oct  5 15:58:34 db01 postgres[21404]: [7293-1] @ 21404 0: FATAL:  could not 
open archive status directory "pg_xlog/archive_status": Too many open files in 
system
Oct  5 15:58:47 db01 postgres[21442]: [7294-1] @ 21442 0: FATAL:  pipe() 
failed: Too many open files in system
Oct  5 15:58:49 db01 postgres[21450]: [7318-1] failjava@dbcluster 21450 0: 
FATAL:  pipe() failed: Too many open files in system
[ Once I setup monitoring, I became aware that the app (in a 100% idle state) 
was consuming ~80K fds because the app's initialization procedure touches 
nearly all 500 tables across it's ~2K connections ]

<rant>It would be exceptionally nice if JDBC grew a brain and be written in an 
intelligent manner such that pgbouncer could be used in transaction mode, 
thereby allowing exceptionally poorly written apps and database connection code 
which establish 5-20 connections per thread could be parked on the front of 
pgbouncer instead of needing to talk directly to a backend process. Use of 
duplicate names for prepared statements, or preparing statements outside of a 
transaction really ties the hands of folk who are attempting to insert 
additional layers in the stack to scale up applications that are effectively in 
an immutable and broken state of disrepair. <pipedream>A more realistic but 
disgusting solution would be to teach pgbouncer to intercept all prepared 
statements from a given client and on-the-fly rewrite the names to something 
that acts as a unique cookie and could be reused across backends if a given 
backend hasn't seen a unique prepared statement before.</pipedream></rant>


I'm trying to determine that now. I actually only included a small snippet of 
the uninitialized pages. It looks like there are 28 total uninitialized pages 
that VACUUM discovered:

[ autovacuum appears to have caught these two, however the slave wasn't turned 
up at this point ]
Oct  5 19:42:44 db01 postgres[37748]: [8908-1] @ 37748 0: WARNING:  relation 
"tbl_E" page 4263 is uninitialized --- fixing
Oct  5 19:43:03 db01 postgres[37324]: [8908-1] @ 37324 0: WARNING:  relation 
"tbl_B" page 2628 is uninitialized --- fixing

[ And the remaining 26 were caught via an explicitly executed `vacuumdb -az` ]
Oct  6 17:03:24 db01 postgres[73864]: [8911-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_F" page 65722 is uninitialized --- fixing
Oct  6 17:03:43 db01 postgres[73864]: [8912-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_D" page 3563 is uninitialized --- fixing
Oct  6 17:03:43 db01 postgres[73864]: [8913-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_D" page 3564 is uninitialized --- fixing
Oct  6 17:03:43 db01 postgres[73864]: [8914-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_D" page 3565 is uninitialized --- fixing
Oct  6 17:03:49 db01 postgres[73864]: [8915-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_C" page 65305 is uninitialized --- fixing
Oct  6 17:03:50 db01 postgres[73864]: [8916-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_C" page 70927 is uninitialized --- fixing
Oct  6 17:03:50 db01 postgres[73864]: [8917-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_C" page 75286 is uninitialized --- fixing
Oct  6 17:03:50 db01 postgres[73864]: [8918-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_C" page 86601 is uninitialized --- fixing
Oct  6 17:03:50 db01 postgres[73864]: [8919-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_C" page 91434 is uninitialized --- fixing
Oct  6 17:03:50 db01 postgres[73864]: [8920-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_C" page 91435 is uninitialized --- fixing
Oct  6 17:03:51 db01 postgres[73864]: [8921-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_C" page 95639 is uninitialized --- fixing
Oct  6 17:03:51 db01 postgres[73864]: [8922-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_C" page 105638 is uninitialized --- fixing
Oct  6 17:03:51 db01 postgres[73864]: [8923-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_C" page 114550 is uninitialized --- fixing
Oct  6 17:03:51 db01 postgres[73864]: [8924-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_C" page 118380 is uninitialized --- fixing
Oct  6 17:03:51 db01 postgres[73864]: [8925-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_C" page 121478 is uninitialized --- fixing
Oct  6 17:03:51 db01 postgres[73864]: [8926-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_C" page 122098 is uninitialized --- fixing
Oct  6 17:03:52 db01 postgres[73864]: [8927-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_C" page 127195 is uninitialized --- fixing
Oct  6 17:03:53 db01 postgres[73864]: [8928-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_C" page 152416 is uninitialized --- fixing
Oct  6 17:03:55 db01 postgres[73864]: [8929-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_C" page 230192 is uninitialized --- fixing
Oct  6 17:03:56 db01 postgres[73864]: [8930-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_C" page 241091 is uninitialized --- fixing
Oct  6 17:20:31 db01 postgres[73864]: [8931-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_A" page 32379133 is uninitialized --- fixing
Oct  6 17:20:32 db01 postgres[73864]: [8932-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_A" page 32385954 is uninitialized --- fixing
Oct  6 17:20:32 db01 postgres[73864]: [8933-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_A" page 32393740 is uninitialized --- fixing
Oct  6 17:20:32 db01 postgres[73864]: [8934-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_A" page 32394599 is uninitialized --- fixing
Oct  6 17:20:32 db01 postgres[73864]: [8935-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_A" page 32402510 is uninitialized --- fixing
Oct  6 17:20:33 db01 postgres[73864]: [8936-1] pgsql@dbcluster 73864 0: 
WARNING:  relation "tbl_A" page 32409328 is uninitialized --- fixing


>> What's odd to me is not the failure scenarios that come from a system 
>> running out of FDs (though seek(2)'ing consuming an FD seems odd), it's more 
>> that it's still possible for a master DB's VACUUM to clean up a bogus or 
>> partial page write, and have the slave crash when the WAL entry is shipped 
>> over.
> 
> It looks to me like vacuumlazy.c doesn't bother to emit a WAL record at
> all when fixing an all-zeroes heap page.  I'm not sure if that's a
> problem or not.  The first actual use of such a page ought to result in
> re-init'ing it anyway (cf XLOG_HEAP_INIT_PAGE logic in heap_insert),
> so right offhand I don't see a path from this to the slave-side failures
> you saw.  (But on the other hand I'm only firing on one cylinder today
> because of a head cold, so maybe I'm just missing it.)

(Good luck w/ the recovery)

> Do the slave-side failures correspond to pages that were reported as
> "fixed" on the master?


It doesn't appear so.

Oct  6 17:12:03 db02 postgres[15191]: [13-1] @ 15191 0: WARNING:  page 2943 of 
relation base/16387/20196 is uninitialized

I do see a few other odd or related messages on the slave, however:

[ from a recovery ]
Oct  6 07:14:50 db02 postgres[11972]: [12-1] @ 11972 0: LOG:  record with zero 
length at 9E/699842C0
Oct  6 07:14:50 db02 postgres[11972]: [13-1] @ 11972 0: LOG:  redo done at 
9E/69984290
Oct  6 07:14:50 db02 postgres[11972]: [14-1] @ 11972 0: LOG:  last completed 
transaction was at log time 2012-10-06 07:12:11.87694+00
Oct  6 07:14:50 db02 postgres[11972]: [15-1] @ 11972 0: LOG:  checkpoint 
starting: end-of-recovery immediate

[ Best to let sleeping dragons lie, I shut the slave down for the night in case 
there was some form of feedback that could be sent back that could hang the 
master ]
Oct  6 07:51:32 db02 postgres[15505]: [16-1] @ 15505 0: LOG:  database system 
is shut down

[ I think this is from trying to connect to an ipv6 addr for localhost, but 
IPv6 has been disabled. Including because the verbosity level seems excessive. ]
Oct  6 16:36:06 db02 postgres[4529]: [7-1] @ 4529 0: LOG:  could not create 
socket for statistics collector: Protocol not supported
Oct  6 16:36:06 db02 postgres[4529]: [8-1] @ 4529 0: LOG:  trying another 
address for the statistics collector

[ Here the database noted the bogus page, but it was still able to start up ]
Oct  6 16:36:06 db02 postgres[4535]: [9-1] @ 4535 0: LOG:  database system was 
interrupted; last known up at 2012-10-06 08:07:41 UTC
Oct  6 16:36:06 db02 postgres[4535]: [10-1] @ 4535 0: LOG:  entering standby 
mode
Oct  6 16:36:06 db02 postgres[4535]: [11-1] @ 4535 0: LOG:  unexpected pageaddr 
88/EE000000 in log file 158, segment 110, offset 0

[ Including this as a small nit: there are two lines emitted for the following 
log entry, which seems incorrect ]
Oct  6 16:41:31 db02 postgres[6416]: [9-1] @ 6416 0: FATAL:  could not connect 
to the primary server: fe_sendauth: no password supplied
Oct  6 16:41:31 db02 postgres[6416]: [9-2]     

Oct  6 16:41:36 db02 postgres[6445]: [9-1] @ 6445 0: LOG:  streaming 
replication successfully connected to primary
Oct  6 16:41:37 db02 postgres[4535]: [12-1] @ 4535 0: LOG:  redo starts at 
9E/6E000020
Oct  6 16:41:37 db02 postgres[4535]: [13-1] @ 4535 0: LOG:  consistent recovery 
state reached at 9E/6E003E10
Oct  6 16:41:37 db02 postgres[4529]: [9-1] @ 4529 0: LOG:  database system is 
ready to accept read only connections


[ Here's what the slave sees on a crash ]
Oct  6 17:09:17 db02 postgres[15561]: [10-1] pgsql@postgres 15561 0: FATAL:  
the database system is starting up
Oct  6 17:11:52 db02 postgres[15191]: [12-1] @ 15191 0: LOG:  consistent 
recovery state reached at 9E/E3DDE2B0
Oct  6 17:11:52 db02 postgres[15185]: [9-1] @ 15185 0: LOG:  database system is 
ready to accept read only connections
Oct  6 17:12:03 db02 postgres[15191]: [13-1] @ 15191 0: WARNING:  page 2943 of 
relation base/16387/20196 is uninitialized
Oct  6 17:12:03 db02 postgres[15191]: [13-2] @ 15191 0: CONTEXT:  xlog redo 
vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403
Oct  6 17:12:03 db02 postgres[15191]: [14-1] @ 15191 0: PANIC:  WAL contains 
references to invalid pages
Oct  6 17:12:03 db02 postgres[15191]: [14-2] @ 15191 0: CONTEXT:  xlog redo 
vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403
Oct  6 17:12:03 db02 postgres[15185]: [10-1] @ 15185 0: LOG:  startup process 
(PID 15191) was terminated by signal 6: Abort trap
[ /me scratches head for a bit ]
Oct  6 17:21:34 db02 postgres[19614]: [9-1] @ 19614 0: LOG:  database system 
was interrupted while in recovery at log time 2012-10-06 08:07:24 UTC
Oct  6 17:21:34 db02 postgres[19614]: [9-2] @ 19614 0: HINT:  If this has 
occurred more than once some data might be corrupted and you might need to 
choose an earlier recovery target.
Oct  6 17:21:34 db02 postgres[19614]: [10-1] @ 19614 0: LOG:  entering standby 
mode
Oct  6 17:21:34 db02 postgres[19614]: [11-1] @ 19614 0: LOG:  redo starts at 
9E/6E000020

[ This is the point that I came up with my hypothesis and enabled 
full_page_writes ]
Oct  6 17:24:14 db02 postgres[19608]: [9-1] @ 19608 0: LOG:  received SIGHUP, 
reloading configuration files
Oct  6 17:24:14 db02 postgres[19608]: [10-1] @ 19608 0: LOG:  parameter 
"full_page_writes" changed to "on"
Oct  6 17:26:34 db02 postgres[19614]: [12-1] @ 19614 0: LOG:  consistent 
recovery state reached at 9E/E3DDE2B0
Oct  6 17:26:34 db02 postgres[19608]: [11-1] @ 19608 0: LOG:  database system 
is ready to accept read only connections
Oct  6 17:26:44 db02 postgres[19614]: [13-1] @ 19614 0: WARNING:  page 2943 of 
relation base/16387/20196 is uninitialized
Oct  6 17:26:44 db02 postgres[19614]: [13-2] @ 19614 0: CONTEXT:  xlog redo 
vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403
Oct  6 17:26:44 db02 postgres[19614]: [14-1] @ 19614 0: PANIC:  WAL contains 
references to invalid pages
Oct  6 17:26:44 db02 postgres[19614]: [14-2] @ 19614 0: CONTEXT:  xlog redo 
vacuum: rel 1663/16387/20196; blk 2944, lastBlockVacuumed 2403
Oct  6 17:26:44 db02 postgres[19608]: [12-1] @ 19608 0: LOG:  startup process 
(PID 19614) was terminated by signal 6: Abort trap
Oct  6 17:26:44 db02 postgres[19608]: [13-1] @ 19608 0: LOG:  terminating any 
other active server processes
[ Given things were still panic'ing on the same page of data, I enabled 
full_page_writes on the master, performed a VACUUM, CHECKPOINT and begin 
resync'ing data to the slave ]

[ And here I started up the slave again ]
Oct  7 00:32:00 db02 postgres[66904]: [9-1] @ 66904 0: LOG:  database system 
was interrupted; last known up at 2012-10-06 17:48:10 UTC
Oct  7 00:32:00 db02 postgres[66904]: [10-1] @ 66904 0: LOG:  entering standby 
mode
Oct  7 00:32:00 db02 postgres[66904]: [11-1] @ 66904 0: LOG:  unexpected 
pageaddr 89/B9000000 in log file 159, segment 55, offset 0
Oct  7 00:32:00 db02 postgres[66906]: [9-1] [unknown]@[unknown] 66906 0: LOG:  
connection received: host=[local]
Oct  7 00:32:00 db02 postgres[66906]: [10-1] pgsql@postgres 66906 0: FATAL:  
the database system is starting up
Oct  7 00:32:00 db02 postgres[66905]: [9-1] @ 66905 0: LOG:  streaming 
replication successfully connected to primary
Oct  7 00:32:01 db02 postgres[66913]: [9-1] [unknown]@[unknown] 66913 0: LOG:  
connection received: host=[local]
Oct  7 00:32:01 db02 postgres[66913]: [10-1] pgsql@postgres 66913 0: FATAL:  
the database system is starting up
Oct  7 00:32:01 db02 postgres[66904]: [12-1] @ 66904 0: LOG:  redo starts at 
9F/36000020
Oct  7 00:32:01 db02 postgres[66904]: [13-1] @ 66904 0: LOG:  consistent 
recovery state reached at 9F/36003FD8
Oct  7 00:32:02 db02 postgres[66897]: [9-1] @ 66897 0: LOG:  database system is 
ready to accept read only connections

[ At this point I initiated a VACUUM and rebuilt a 400GB table using 
pg_reorg(1) to evict all WAL files that would contain possibly poisonous data ]


And that's about all I know at this point in time. Things have been up and 
running successfully and without issue ever since. I'm tempted to pg_reorg(1) 
all of the tables in order to guarantee that all tables are 100% intact since I 
can't easily VACUUM FULL these tables (several of the largest, most frequently 
used are several TB in size). LMK if there's anything else you're interested in 
seeing or having outlined. Again, the master recovered gracefully, didn't 
require any manual intervention other than increasing kern.maxfiles, and hasn't 
skipped a beat once since kern.maxfiles was increased. Having the slaves crap 
out when the master executed a VACUUM, however, is of interest. When running 
the manual VACUUM, I had a window open to both db01 and db02 and db02's backend 
panic'ing was clearly linked to the VACUUM warnings on the db01.

In the regression testing environments, is it possible to enable a test mode 
that runs with full_page_writes disabled in order to push the boundaries of 
disabling full_page_writes? The IO bandwidth savings are substantial for small 
mutations, and for folks lucky enough to run on ZFS (i.e. not ufs, ext*), it'd 
be really cool to be able to realize those bandwidth savings. As is, it appears 
that standalone instances of Pg are full_page_writes safe, but that slaves are 
still making assumptions about full page WAL entries being the MTU during some 
corner cases. My observations lead me to believe that replication works fine 
with full_page_writes disabled until the master cleans up a zero-filled or 
damaged page.



For the archives (if anyone makes it this far to the bottom of the note), what 
I did to operationally "fix" things:

1) On the master I re-enabled full_page_writes and HUP'ed pg to reload the 
setting.
2) Fired off a VACUUM ANALYZE to generate some WAL data. This step isn't 
necessary, but I wanted to make sure that there was something to CHECKPOINT.
3) Issue a CHECKPOINT & wait until this is completed.
4) Re-sync data to the slave (via rsync(1) or whatever your tool of choice is).
5) Restarted the slave.
6) Verify that the slave is in fact replicating via pg_stat_replication


-sc



--
Sean Chittenden
s...@chittenden.org



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to