Hi PostgreSQL community:

We have a three node postgresql BDR set up.  One of our nodes went down due to 
a power issue.  After bringing the server back online the OS reported the need 
to repair some files.  Once this completed and we restarted the postgresql 
service, we noticed that it was crashing very quickly.  Checking the logs 
revealed some panics (please see below).  The other two nodes appear to up and 
running and replicating with each other but our WAL backlog is slowly growing.


A description of what you are trying to achieve and what results you expect:
Repair a damaged node and get it replicating to/from the other nodes in our 
cluster.

PostgreSQL version number you are running:
PostgreSQL 9.4.4 on x86 64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 
20120313 (Red Hat 4.4.7-11), 64-bit

How you installed PostgreSQL:
Downloaded postgresql with BDR RPMs from 2nd Quandrant.

Changes made to the settings in the postgresql.conf file:
"application_name";"pgAdmin III - Query Tool";"client"
"bdr.log_conflicts_to_table";"on";"configuration file"
"bytea_output";"escape";"session"
"client_encoding";"SQL_ASCII";"session"
"client_min_messages";"notice";"session"
"DateStyle";"ISO, MDY";"session"
"default_sequenceam";"bdr";"configuration file"
"default_text_search_config";"pg_catalog.english";"configuration file"
"dynamic_shared_memory_type";"posix";"configuration file"
"lc_messages";"en_US.UTF-8";"configuration file"
"lc_monetary";"en_US.UTF-8";"configuration file"
"lc_numeric";"en_US.UTF-8";"configuration file"
"lc_time";"en_US.UTF-8";"configuration file"
"listen_addresses";"*";"configuration file"
"log_destination";"stderr";"configuration file"
"log_directory";"/var/log/postgresql";"configuration file"
"log_error_verbosity";"default";"configuration file"
"log_filename";"postgresql-%Y-%m-%d_%H%M%S.log";"configuration file"
"log_line_prefix";"t:%m d=%d p=%p a=%a%q ";"configuration file"
"log_min_messages";"info";"configuration file"
"log_rotation_age";"1d";"configuration file"
"log_rotation_size";"0";"configuration file"
"log_timezone";"UTC";"configuration file"
"log_truncate_on_rotation";"on";"configuration file"
"logging_collector";"on";"configuration file"
"max_connections";"100";"configuration file"
"max_replication_slots";"10";"configuration file"
"max_stack_depth";"2MB";"environment variable"
"max_wal_senders";"10";"configuration file"
"max_worker_processes";"10";"configuration file"
"port";"54330";"command line"
"shared_buffers";"128MB";"configuration file"
"shared_preload_libraries";"bdr";"configuration file"
"TimeZone";"UTC";"configuration file"
"track_commit_timestamp";"on";"configuration file"
"wal_level";"logical";"configuration file"

Operating system and version:
CentOS release 6.6
Linux <removed server name> 2.6.32-504.el6.x86_64 #1 SMP Wed Oct 15 04:27:16 
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

For questions about any kind of error:

What program you're using to connect to PostgreSQL:
Django 1.7.0

Is there anything relevant or unusual in the PostgreSQL server logs?:
Hit a panic when attempting to create a replication identifier checkpoint.  
Subsequent restarts of postgresql failed with a panic complaining that it could 
not open a replication slot file.

...
t:2016-05-19 01:14:51.668 UTC d= p=144 a=PANIC:  could not create replication 
identifier checkpoint "pg_logical/checkpoints/8-F3923F98.ckpt.tmp": Invalid 
argument
t:2016-05-19 01:14:51.671 UTC d= p=9729 a=WARNING:  could not create 
relation-cache initialization file "global/pg_internal.init.9729": Invalid 
argument
t:2016-05-19 01:14:51.671 UTC d= p=9729 a=DETAIL:  Continuing anyway, but 
there's something wrong.
t:2016-05-19 01:14:51.674 UTC d= p=133 a=LOG:  checkpointer process (PID 144) 
was terminated by signal 6: Aborted
t:2016-05-19 01:14:51.674 UTC d= p=133 a=LOG:  terminating any other active 
server processes
t:2016-05-19 01:14:51.675 UTC d= p=147 a=WARNING:  terminating connection 
because of crash of another server process
t:2016-05-19 01:14:51.675 UTC d= p=147 a=DETAIL:  The postmaster has commanded 
this server process to roll back the current transaction and exit, because 
another server process exited abnormally and possibly corrupted shared memory.
t:2016-05-19 01:14:51.675 UTC d= p=147 a=HINT:  In a moment you should be able 
to reconnect to the database and repeat your command.
t:2016-05-19 01:14:51.675 UTC d= p=148 a=LOG:  could not open temporary 
statistics file "pg_stat/global.tmp": Invalid argument
t:2016-05-19 01:14:51.694 UTC d= p=9729 a=WARNING:  terminating connection 
because of crash of another server process
t:2016-05-19 01:14:51.694 UTC d= p=9729 a=DETAIL:  The postmaster has commanded 
this server process to roll back the current transaction and exit, because 
another server process exited abnormally and possibly corrupted shared memory.
t:2016-05-19 01:14:51.694 UTC d= p=9729 a=HINT:  In a moment you should be able 
to reconnect to the database and repeat your command.
t:2016-05-19 01:14:51.786 UTC d= p=9730 a=WARNING:  terminating connection 
because of crash of another server process
t:2016-05-19 01:14:51.786 UTC d= p=9730 a=DETAIL:  The postmaster has commanded 
this server process to roll back the current transaction and exit, because 
another server process exited abnormally and possibly corrupted shared memory.
t:2016-05-19 01:14:51.786 UTC d= p=9730 a=HINT:  In a moment you should be able 
to reconnect to the database and repeat your command.
t:2016-05-19 01:14:51.787 UTC d= p=9731 a=WARNING:  terminating connection 
because of crash of another server process
t:2016-05-19 01:14:51.787 UTC d= p=9731 a=DETAIL:  The postmaster has commanded 
this server process to roll back the current transaction and exit, because 
another server process exited abnormally and possibly corrupted shared memory.
t:2016-05-19 01:14:51.787 UTC d= p=9731 a=HINT:  In a moment you should be able 
to reconnect to the database and repeat your command.
t:2016-05-19 01:14:51.819 UTC d= p=133 a=LOG:  all server processes terminated; 
reinitializing
t:2016-05-19 01:14:51.819 UTC d= p=133 a=FATAL:  could not open 
"global/bdr.stat.tmp": Invalid argument
< end of the log file>

<start of next log file>
t:2016-05-19 15:01:58.989 UTC d= p=40 a=LOG:  database system was interrupted; 
last known up at 2016-05-19 00:58:43 UTC
t:2016-05-19 15:01:59.251 UTC d=postgres p=41 a=[unknown] FATAL:  the database 
system is starting up
t:2016-05-19 15:01:59.390 UTC d= p=40 a=PANIC:  could not open file 
"pg_replslot/bdr_19814_6258399326068244492_2_19814__/state": No such file or 
directory
t:2016-05-19 15:01:59.390 UTC d= p=38 a=LOG:  startup process (PID 40) was 
terminated by signal 6: Aborted
t:2016-05-19 15:01:59.390 UTC d= p=38 a=LOG:  aborting startup due to startup 
process failure
< end of the log file>

<start of next log file>
t:2016-05-19 15:04:15.139 UTC d= p=40 a=LOG:  database system was interrupted; 
last known up at 2016-05-19 00:58:43 UTC
t:2016-05-19 15:04:15.289 UTC d= p=40 a=PANIC:  could not open file 
"pg_replslot/bdr_19814_6258399326068244492_2_19814__/state": No such file or 
directory
t:2016-05-19 15:04:15.289 UTC d= p=38 a=LOG:  startup process (PID 40) was 
terminated by signal 6: Aborted
t:2016-05-19 15:04:15.289 UTC d= p=38 a=LOG:  aborting startup due to startup 
process failure
< end of the log file>


What you were doing when the error happened / how to cause the error:
May have been a power surge that caused the server to go offline.  On restart 
the OS found problems with files that needed to be repaired.  Starting 
postgresql seems  to complete but then the process crashes



Cameron Smith


This e-mail and any attachments are intended only for use by the addressee(s) 
named herein and may contain confidential information. If you are not the 
intended recipient of this e-mail, you are hereby notified any dissemination, 
distribution or copying of this email and any attachments is strictly 
prohibited. If you receive this email in error, please immediately notify the 
sender by return email and permanently delete the original, any copy and any 
printout thereof. The integrity and security of e-mail cannot be guaranteed.

Reply via email to