Hi all,

I have a new 9.1.1 hot standby machine (VM). It consistently starts
up, goes through recovery through these same WAL segments, and then
gets a "Bus error":

2011-11-30 20:57:37 EST LOG:  database system was interrupted while in
recovery at log time 2011-11-30 09:37:11 EST
2011-11-30 20:57:37 EST HINT:  If this has occurred more than once
some data might be corrupted and you might need to choose an earlier
recovery target.
2011-11-30 20:57:37 EST LOG:  entering standby mode, pausing.
2011-11-30 20:57:57 EST LOG:  restored log file
"000000010000005D00000045" from archive
2011-11-30 20:57:58 EST LOG:  redo starts at 5D/4560B61C
2011-11-30 20:57:59 EST LOG:  restored log file
"000000010000005D00000046" from archive
2011-11-30 20:58:00 EST LOG:  restored log file
"000000010000005D00000047" from archive
2011-11-30 20:58:03 EST LOG:  restored log file
"000000010000005D00000048" from archive
2011-11-30 20:58:04 EST LOG:  restored log file
"000000010000005D00000049" from archive
2011-11-30 20:58:05 EST LOG:  restored log file
"000000010000005D0000004A" from archive
2011-11-30 20:58:06 EST LOG:  restored log file
"000000010000005D0000004B" from archive
2011-11-30 20:58:07 EST LOG:  restored log file
"000000010000005D0000004C" from archive
2011-11-30 20:58:07 EST LOG:  startup process (PID 23167) was
terminated by signal 7: Bus error
2011-11-30 20:58:07 EST LOG:  terminating any other active server processes


After I saw the same problem a few times in a row, I stuck a
sleep(20); call right after the "entering standby mode" message (hence
the modified message "entering standby mode, pausing"), so I could
have a few seconds to attach to the startup process with gdb and get a
backtrace; I'm not sure why, but a core dump wasn't produced
automatically, even though I think I've set ulimit appropriately.
Anyway, here is the backtrace:

(gdb) continue
Continuing.

Program received signal SIGBUS, Bus error.
0x0825b329 in ReadBuffer_common (smgr=0xa0be758, relpersistence=112 'p',
    forkNum=MAIN_FORKNUM, blockNum=98560, mode=RBM_ZERO, strategy=0x0,
    hit=0xbfa0403f "") at /usr/include/bits/string3.h:85
85        return __builtin___memset_chk (__dest, __ch, __len, __bos0 (__dest));
(gdb) bt
#0  0x0825b329 in ReadBuffer_common (smgr=0xa0be758, relpersistence=112 'p',
    forkNum=MAIN_FORKNUM, blockNum=98560, mode=RBM_ZERO, strategy=0x0,
    hit=0xbfa0403f "") at /usr/include/bits/string3.h:85
#1  0x0825bcdf in ReadBufferWithoutRelcache (rnode=..., forkNum=MAIN_FORKNUM,
    blockNum=98560, mode=RBM_ZERO, strategy=0x0) at bufmgr.c:272
#2  0x080d4998 in XLogReadBufferExtended (rnode=..., forknum=MAIN_FORKNUM,
    blkno=98560, mode=RBM_ZERO) at xlogutils.c:296
#3  0x080c6b85 in RestoreBkpBlocks (lsn=..., record=0xa0b53c8, cleanup=0 '\000')
    at xlog.c:3589
#4  0x080dd294 in gin_redo (lsn=..., record=0xa0b53c8) at ginxlog.c:676
#5  0x080d1759 in StartupXLOG () at xlog.c:6599
#6  0x080d42a5 in StartupProcessMain () at xlog.c:10028
#7  0x080eb637 in AuxiliaryProcessMain (argc=2, argv=<value optimized out>)
    at bootstrap.c:417
#8  0x0823e5ac in StartChildProcess (type=StartupProcess) at postmaster.c:4498
#9  0x08240afe in PostmasterMain (argc=3, argv=0xa083838) at postmaster.c:1111
#10 0x081e1a88 in main (argc=3, argv=0xa083838) at main.c:199
(gdb)


Some more details about my setup:
The hot standby is an Ubuntu 11.04 32-bit OpenVZ VM. The master is a
Debian Squeeze 32-bit VM. Both master and hot standby are running
9.1.1 compiled from source with:
  ./configure --prefix=/home/postgres/runtime/ --enable-debug --with-python

The master is archiving WAL files to the hot standby using rsync. I
initialized $PGDATA on the standby using a simple:
 rsync -avz postgres@master:/home/postgres/pgdata /home/postgres/

I'll attach my non-default pg_settings and recovery.conf file as well
for completeness. I notice gin_redo() in the backtrace; I am using a
full-text index like this:
    "xxx_txtsearch" gin (to_tsvector('english'::regconfig, title || body))

on a large partitioned table.

Since the standby VM is new (I've used the same hosting provider
before, but this is the first time I tried an Ubuntu VM), it's
entirely possible I messed something up in the set-up. Or it could be
some flakiness with the VM itself.

But before I start blowing things away and trying to start from
scratch, I figured I'd leave the machine alone for a bit in case this
report is interesting and any more sleuthing would help.

Josh
pgdata# SELECT name, setting FROM pg_settings WHERE source != 'default';
            name            |                         setting                   
       
----------------------------+---------------------------------------------------
-------
 application_name           | psql
 archive_command            | rsync -a %p 
postgres@standby:/home/postgres/wal_archive/%f
 archive_mode               | on
 checkpoint_segments        | 30
 client_encoding            | UTF8
 config_file                | /home/postgres/pgdata/postgresql.conf
 data_directory             | /home/postgres/pgdata
 DateStyle                  | ISO, MDY
 default_text_search_config | pg_catalog.english
 effective_cache_size       | 49152
 fsync                      | on
 hba_file                   | /home/postgres/pgdata/pg_hba.conf
 hot_standby                | on
 ident_file                 | /home/postgres/pgdata/pg_ident.conf
 lc_collate                 | en_US.UTF-8
 lc_ctype                   | en_US.UTF-8
 lc_messages                | en_US.UTF-8
 lc_monetary                | en_US.UTF-8
 lc_numeric                 | en_US.UTF-8
 lc_time                    | en_US.UTF-8
 listen_addresses           | *
 log_line_prefix            | %t 
 log_min_duration_statement | 1000
 log_timezone               | America/New_York
 logging_collector          | on
 maintenance_work_mem       | 65536
 max_connections            | 15
 max_stack_depth            | 2048
 max_wal_senders            | 2
 server_encoding            | UTF8
 shared_buffers             | 16384
 synchronous_commit         | off
 TimeZone                   | America/New_York
 transaction_deferrable     | off
 transaction_isolation      | read committed
 transaction_read_only      | off
 wal_buffers                | 512
 wal_keep_segments          | 32
 wal_level                  | hot_standby
 work_mem                   | 24576
(40 rows)

Attachment: recovery.conf
Description: Binary data

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply via email to