On Fri, Mar 29, 2013 at 1:52 PM, anara...@anarazel.de <and...@anarazel.de> wrote: > I think I See whats going on. Do you still have the datadir available? If so, > could you send the pg_controldata output?
I have already rebuilt the replica, however below is the output if it is useful: pg_control version number: 922 Catalog version number: 201204301 Database system identifier: 5858109675396804534 Database cluster state: in archive recovery pg_control last modified: Сбт 30 Мар 2013 00:21:11 Latest checkpoint location: 258B/BDBBE748 Prior checkpoint location: 258B/86DABCB8 Latest checkpoint's REDO location: 258B/8B78BED0 Latest checkpoint's TimeLineID: 2 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/423386899 Latest checkpoint's NextOID: 2758636912 Latest checkpoint's NextMultiXactId: 103920 Latest checkpoint's NextMultiOffset: 431309 Latest checkpoint's oldestXID: 225634745 Latest checkpoint's oldestXID's DB: 16436 Latest checkpoint's oldestActiveXID: 421766298 Time of latest checkpoint: Птн 29 Мар 2013 22:33:01 Minimum recovery ending location: 258C/14AA5FA0 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no Current wal_level setting: hot_standby Current max_connections setting: 550 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value > > > Andres > >>Hardware: >> >>IBM System x3650 M4, 148GB RAM, NAS >> >>Software: >> >>PostgreSQL 9.2.3, yum.postgresql.org >>CentOS 6.3, kernel 2.6.32-279.22.1.el6.x86_64 >> >>Configuration: >> >>listen_addresses = '*' >>max_connections = 550 >>shared_buffers = 35GB >>work_mem = 256MB >>maintenance_work_mem = 1GB >>bgwriter_delay = 10ms >>bgwriter_lru_multiplier = 10.0 >>effective_io_concurrency = 32 >>wal_level = hot_standby >>synchronous_commit = off >>checkpoint_segments = 1024 >>checkpoint_timeout = 1h >>checkpoint_completion_target = 0.9 >>checkpoint_warning = 5min >>max_wal_senders = 3 >>wal_keep_segments = 2048 >>hot_standby = on >>max_standby_streaming_delay = 5min >>hot_standby_feedback = on >>effective_cache_size = 133GB >>log_directory = '/var/log/pgsql' >>log_filename = 'postgresql-%Y-%m-%d.log' >>log_checkpoints = on >>log_line_prefix = '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] ' >>log_lock_waits = on >>log_statement = 'ddl' >>log_timezone = 'W-SU' >>track_activity_query_size = 4096 >>autovacuum_max_workers = 5 >>autovacuum_naptime = 5s >>autovacuum_vacuum_scale_factor = 0.05 >>autovacuum_analyze_scale_factor = 0.05 >>autovacuum_vacuum_cost_delay = 5ms >>datestyle = 'iso, dmy' >>timezone = 'W-SU' >>lc_messages = 'en_US.UTF-8' >>lc_monetary = 'ru_RU.UTF-8' >>lc_numeric = 'ru_RU.UTF-8' >>lc_time = 'ru_RU.UTF-8' >>default_text_search_config = 'pg_catalog.russian' >> >>System: >> >># Controls the maximum shared segment size, in bytes >>kernel.shmmax = 53287555072 >> >># Controls the maximum number of shared memory segments, in pages >>kernel.shmall = 13009657 >> >># Maximum number of file-handles >>fs.file-max = 65535 >> >># pdflush tuning to prevent lag spikes >>vm.dirty_ratio = 10 >>vm.dirty_background_ratio = 1 >>vm.dirty_expire_centisecs = 499 >> >># Prevent the scheduler breakdown >>kernel.sched_migration_cost = 5000000 >> >># Turned off to provide more CPU to PostgreSQL >>kernel.sched_autogroup_enabled = 0 >> >># Setup hugepages >>vm.hugetlb_shm_group = 26 >>vm.hugepages_treat_as_movable = 0 >>vm.nr_overcommit_hugepages = 512 >> >># The Huge Page Size is 2048kB, so for 35GB shared buffers the number >>is 17920 >>vm.nr_hugepages = 17920 >> >># Turn off the NUMA local pages reclaim as it leads to wrong caching >>strategy for databases >>vm.zone_reclaim_mode = 0 >> >>Environment: >> >>HUGETLB_SHM=yes >>LD_PRELOAD='/usr/lib64/libhugetlbfs.so' >>export HUGETLB_SHM LD_PRELOAD >> >>When it is stopped: >> >>2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] LOG: >>restartpoint complete: wrote 1685004 buffers (36.7%); 0 transaction >>log file(s) added, 0 removed, 555 recycled; write=3237.402 s, >>sync=0.071 s, total=3237.507 s; sync files=2673, longest=0.008 s, >>average=0.000 s >>2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] LOG: recovery >>restart point at 2538/6E154AC0 >>2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] DETAIL: last >>completed transaction was at log time 2013-03-26 11:50:31.613948+04 >>2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] LOG: >>restartpoint starting: xlog >>2013-03-26 11:51:16 MSK 3773 @ from [vxid:1/0 txid:0] [] WARNING: >>page 451 of relation base/16436/2686702648 is uninitialized >>2013-03-26 11:51:16 MSK 3773 @ from [vxid:1/0 txid:0] [] CONTEXT: >>xlog redo vacuum: rel 1663/16436/2686702648; blk 2485, >>lastBlockVacuumed 0 >>2013-03-26 11:51:16 MSK 3773 @ from [vxid:1/0 txid:0] [] PANIC: WAL >>contains references to invalid pages >>2013-03-26 11:51:16 MSK 3773 @ from [vxid:1/0 txid:0] [] CONTEXT: >>xlog redo vacuum: rel 1663/16436/2686702648; blk 2485, >>lastBlockVacuumed 0 >>2013-03-26 11:51:16 MSK 3770 @ from [vxid: txid:0] [] LOG: startup >>process (PID 3773) was terminated by signal 6: Aborted >>2013-03-26 11:51:16 MSK 3770 @ from [vxid: txid:0] [] LOG: >>terminating any other active server processes >> > >From /var/log/messages: >> >>Mar 26 10:50:52 tms2 kernel: : postmaster: page allocation failure. >>order:8, mode:0xd0 >>Mar 26 10:50:52 tms2 kernel: : Pid: 3774, comm: postmaster Not tainted >>2.6.32-279.22.1.el6.x86_64 #1 >>Mar 26 10:50:52 tms2 kernel: : Call Trace: >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8112343f>] ? >>__alloc_pages_nodemask+0x77f/0x940 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115d3e2>] ? >>kmem_getpages+0x62/0x170 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115dffa>] ? >>fallback_alloc+0x1ba/0x270 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115da4f>] ? >>cache_grow+0x2cf/0x320 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115dd79>] ? >>____cache_alloc_node+0x99/0x160 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff813fd455>] ? >>dma_pin_iovec_pages+0xb5/0x230 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8115eb49>] ? >>__kmalloc+0x189/0x220 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff813fd455>] ? >>dma_pin_iovec_pages+0xb5/0x230 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8141a47c>] ? >>lock_sock_nested+0xac/0xc0 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8146edaa>] ? >>tcp_recvmsg+0x4ca/0xe80 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffffa029a67e>] ? >>xfs_vm_write_end+0x2e/0x60 [xfs] >>Mar 26 10:50:52 tms2 kernel: : [<ffffffffa0292f39>] ? >>xfs_trans_unlocked_item+0x39/0x60 [xfs] >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8148f90a>] ? >>inet_recvmsg+0x5a/0x90 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff81418b93>] ? >>sock_recvmsg+0x133/0x160 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffffa029f542>] ? >>xfs_rw_iunlock+0x32/0x40 [xfs] >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff81090be0>] ? >>autoremove_wake_function+0x0/0x40 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff81173fff>] ? >>__dentry_open+0x23f/0x360 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff811764ca>] ? >>do_sync_write+0xfa/0x140 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff811875d0>] ? >>do_filp_open+0x780/0xd60 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff81418d0e>] ? >>sys_recvfrom+0xee/0x180 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff81183105>] ? putname+0x35/0x50 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff81176842>] ? >>vfs_write+0x132/0x1a0 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff810d3f47>] ? >>audit_syscall_entry+0x1d7/0x200 >>Mar 26 10:50:52 tms2 kernel: : [<ffffffff8100b072>] ? >>system_call_fastpath+0x16/0x1b >> >>After restart: >> >>2013-03-27 10:43:01 MSK 1535 @ from [vxid: txid:0] [] LOG: database >>system was interrupted while in recovery at log time 2013-03-26 >>10:08:37 MSK >>2013-03-27 10:43:01 MSK 1535 @ from [vxid: txid:0] [] HINT: If this >>has occurred more than once some data might be corrupted and you might >>need to choose an earlier recovery target. >>2013-03-27 10:43:01 MSK 1535 @ from [vxid: txid:0] [] LOG: entering >>standby mode >>2013-03-27 10:43:01 MSK 1535 @ from [vxid:1/0 txid:0] [] LOG: redo >>starts at 2538/6E154AC0 >>2013-03-27 10:43:02 MSK 1535 @ from [vxid:1/0 txid:0] [] LOG: file >>"pg_subtrans/10B7" doesn't exist, reading as zeroes >>2013-03-27 10:43:02 MSK 1535 @ from [vxid:1/0 txid:0] [] CONTEXT: >>xlog redo xid assignment xtop 280482186: subxacts: 280483328 280483333 >>280483343 280483354 280483356 280483368 280483377 280483382 280483392 >>280483404 280483416 280483429 280483440 280483451 280483460 280483472 >>280483487 280483500 280483516 280483530 280483541 280483555 280483565 >>280483574 280483585 280483595 280483604 280483607 280483617 280483626 >>280483636 280483646 280483656 280483665 280483677 280483688 280483699 >>280483709 280483719 280483730 280483739 280483749 280483759 280483761 >>280483771 280483782 280483799 280483800 280483811 280483821 280483824 >>280483836 280483847 280483859 280483871 280483874 280483883 280483897 >>280483906 280483915 280483925 280483937 280483948 280483958 >>2013-03-27 10:43:02 MSK 1535 @ from [vxid:1/0 txid:0] [] LOG: file >>"pg_subtrans/10B7" doesn't exist, reading as zeroes >>2013-03-27 10:43:02 MSK 1535 @ from [vxid:1/0 txid:0] [] CONTEXT: >>xlog redo xid assignment xtop 280482270: subxacts: 280485056 280485070 >>280485083 280485086 280485098 280485113 280485132 280485144 280485156 >>280485167 280485178 280485188 280485201 280485217 280485234 280485249 >>280485267 280485293 280485309 280485327 280485333 280485345 280485353 >>280485373 280485388 280485405 280485420 280485434 280485457 280485476 >>280485482 280485507 280485516 280485531 280485537 280485550 280485565 >>280485568 280485585 280485587 280485601 280485613 280485634 280485639 >>280485656 280485669 280485684 280485690 280485693 280485712 280485730 >>280485754 280485757 280485779 280485801 280485808 280485811 280485830 >>280485856 280485880 280485900 280485920 280485941 280485946 >> >>[ skipped several more messages of this kind] >> >>2013-03-27 10:43:03 MSK 1535 @ from [vxid:1/0 txid:0] [] LOG: file >>"pg_subtrans/10B8" doesn't exist, reading as zeroes >>2013-03-27 10:43:03 MSK 1535 @ from [vxid:1/0 txid:0] [] CONTEXT: >>xlog redo xid assignment xtop 280549936: subxacts: 280555514 280555515 >>280555516 280555517 280555518 280555519 280555520 280555521 280555522 >>280555523 280555524 280555525 280555526 280555527 280555528 280555529 >>280555530 280555531 280555532 280555533 280555534 280555535 280555536 >>280555537 280555538 280555539 280555540 280555541 280555542 280555543 >>280555544 280555545 280555546 280555547 280555548 280555549 280555550 >>280555551 280555552 280555553 280555554 280555555 280555556 280555557 >>280555558 280555559 280555560 280555561 280555562 280555563 280555564 >>280555565 280555566 280555567 280555568 280555569 280555570 280555571 >>280555572 280555573 280555574 280555575 280555576 280555577 >>2013-03-27 10:43:03 MSK 1535 @ from [vxid:1/0 txid:0] [] FATAL: >>could not access status of transaction 280557568 >>2013-03-27 10:43:03 MSK 1535 @ from [vxid:1/0 txid:0] [] DETAIL: >>Could not read from file "pg_subtrans/10B8" at offset 253952: Success. >>2013-03-27 10:43:03 MSK 1535 @ from [vxid:1/0 txid:0] [] CONTEXT: >>xlog redo xid assignment xtop 280555981: subxacts: 280557520 280557521 >>280557522 280557523 280557524 280557525 280557526 280557527 280557528 >>280557529 280557530 280557531 280557532 280557533 280557534 280557535 >>280557536 280557537 280557538 280557539 280557540 280557541 280557542 >>280557543 280557544 280557545 280557546 280557547 280557548 280557549 >>280557550 280557551 280557552 280557553 280557554 280557555 280557556 >>280557557 280557558 280557559 280557560 280557561 280557562 280557563 >>280557564 280557565 280557566 280557567 280557568 280557569 280557570 >>280557571 280557572 280557573 280557574 280557575 280557576 280557577 >>280557578 280557579 280557580 280557581 280557582 280557583 >>2013-03-27 10:43:03 MSK 1532 @ from [vxid: txid:0] [] LOG: startup >>process (PID 1535) exited with exit code 1 >>2013-03-27 10:43:03 MSK 1532 @ from [vxid: txid:0] [] LOG: >>terminating any other active server processes >> >>Thank you in advance. >> >>-- >>Kind regards, >>Sergey Konoplev >>Database and Software Consultant >> >>Profile: http://www.linkedin.com/in/grayhemp >>Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) >>888-1979 >>Skype: gray-hemp >>Jabber: gray...@gmail.com > > > --- > Please excuse brevity and formatting - I am writing this on my mobile phone. > --- > Please excuse brevity and formatting - I am writing this on my mobile phone. -- Kind regards, Sergey Konoplev Database and Software Consultant Profile: http://www.linkedin.com/in/grayhemp Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979 Skype: gray-hemp Jabber: gray...@gmail.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs