On Sat, Jun 1, 2019 at 8:53 PM Adrian Klaver <adrian.kla...@aklaver.com> wrote:
> On 6/1/19 5:32 PM, Tom K wrote: > > > > > > > Trying what we did above but on the second node: > > > > Was this node the primary? > > To me the below looks like there are replication slots set up that are > failing. Not sure how to deal with this at the moment. You might try > single-user mode: > > https://www.postgresql.org/docs/10/app-postgres.html > Single-User Mode > ... > > and see if that at least gets the server started. This is a highly > restricted so do not expect much usability. > These servers did crash before however didn't' notice anything particularly wrong at the time. Perhaps a better check was in order. Here is the result from attempting to start the DB in single user mode. -bash-4.2$ /usr/pgsql-10/bin/postgres --single -D /data/patroni --config-file=/data/patroni/postgresql.conf --hot_standby=off --listen_addresses=192.168.0.124 --max_worker_processes=8 --max_locks_per_transaction=64 --wal_level=replica --cluster_name=postgres --wal_log_hints=on --max_wal_senders=10 --track_commit_timestamp=off --max_prepared_transactions=0 --port=5432 --max_replication_slots=10 --max_connections=20 -d 5 2>&1 2019-06-01 23:04:20.876 EDT [31831] DEBUG: invoking IpcMemoryCreate(size=144687104) 2019-06-01 23:04:20.877 EDT [31831] DEBUG: mmap(144703488) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory 2019-06-01 23:04:20.890 EDT [31831] DEBUG: SlruScanDirectory invoking callback on pg_notify/0000 2019-06-01 23:04:20.890 EDT [31831] DEBUG: removing file "pg_notify/0000" 2019-06-01 23:04:20.890 EDT [31831] DEBUG: dynamic shared memory system will support 128 segments 2019-06-01 23:04:20.891 EDT [31831] DEBUG: created dynamic shared memory control segment 846930886 (3088 bytes) 2019-06-01 23:04:20.891 EDT [31831] DEBUG: InitPostgres 2019-06-01 23:04:20.891 EDT [31831] DEBUG: my backend ID is 1 2019-06-01 23:04:20.891 EDT [31831] LOG: database system was interrupted; last known up at 2019-04-28 06:06:24 EDT 2019-06-01 23:04:21.239 EDT [31831] LOG: invalid record length at 0/4C35CDF8: wanted 24, got 0 2019-06-01 23:04:21.239 EDT [31831] LOG: invalid primary checkpoint record 2019-06-01 23:04:21.239 EDT [31831] LOG: using previous checkpoint record at 0/4C34EDA8 2019-06-01 23:04:21.239 EDT [31831] DEBUG: redo record is at 0/4C34ED70; shutdown FALSE 2019-06-01 23:04:21.239 EDT [31831] DEBUG: next transaction ID: 0:1409831; next OID: 237578 2019-06-01 23:04:21.239 EDT [31831] DEBUG: next MultiXactId: 48; next MultiXactOffset: 174 2019-06-01 23:04:21.239 EDT [31831] DEBUG: oldest unfrozen transaction ID: 549, in database 1 2019-06-01 23:04:21.239 EDT [31831] DEBUG: oldest MultiXactId: 1, in database 1 2019-06-01 23:04:21.239 EDT [31831] DEBUG: commit timestamp Xid oldest/newest: 0/0 2019-06-01 23:04:21.239 EDT [31831] DEBUG: transaction ID wrap limit is 2147484196, limited by database with OID 1 2019-06-01 23:04:21.239 EDT [31831] DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1 2019-06-01 23:04:21.239 EDT [31831] DEBUG: starting up replication slots 2019-06-01 23:04:21.239 EDT [31831] DEBUG: starting up replication origin progress state 2019-06-01 23:04:21.239 EDT [31831] PANIC: replication checkpoint has wrong magic 0 instead of 307747550 Aborted -bash-4.2$ -bash-4.2$ /usr/pgsql-10/bin/postgres --single -D /data/patroni --config-file=/data/patroni/postgresql.conf 2019-06-01 23:06:28.679 EDT [31969] LOG: database system was interrupted; last known up at 2019-04-28 06:06:24 EDT 2019-06-01 23:06:28.953 EDT [31969] LOG: invalid record length at 0/4C35CDF8: wanted 24, got 0 2019-06-01 23:06:28.953 EDT [31969] LOG: invalid primary checkpoint record 2019-06-01 23:06:28.953 EDT [31969] LOG: using previous checkpoint record at 0/4C34EDA8 2019-06-01 23:06:28.953 EDT [31969] PANIC: replication checkpoint has wrong magic 0 instead of 307747550 Aborted -bash-4.2$ > > > Removing the recovery.conf and restarting yields the following messages: > > > > 2019-06-01 20:31:12.231 EDT [21910] LOG: database system was > > interrupted; last known up at 2019-04-28 06:06:24 EDT > > 2019-06-01 20:31:12.505 EDT [21910] LOG: invalid record length at > > 0/4C35CDF8: wanted 24, got 0 > > 2019-06-01 20:31:12.505 EDT [21910] LOG: invalid primary checkpoint > record > > 2019-06-01 20:31:12.505 EDT [21910] LOG: using previous checkpoint > > record at 0/4C34EDA8 > > 2019-06-01 20:31:12.505 EDT [21910] DEBUG: redo record is at > > 0/4C34ED70; shutdown FALSE > > 2019-06-01 20:31:12.505 EDT [21910] DEBUG: next transaction ID: > > 0:1409831; next OID: 237578 > > 2019-06-01 20:31:12.505 EDT [21910] DEBUG: next MultiXactId: 48; next > > MultiXactOffset: 174 > > 2019-06-01 20:31:12.505 EDT [21910] DEBUG: oldest unfrozen transaction > > ID: 549, in database 1 > > 2019-06-01 20:31:12.505 EDT [21910] DEBUG: oldest MultiXactId: 1, in > > database 1 > > 2019-06-01 20:31:12.505 EDT [21910] DEBUG: commit timestamp Xid > > oldest/newest: 0/0 > > 2019-06-01 20:31:12.505 EDT [21910] DEBUG: transaction ID wrap limit is > > 2147484196, limited by database with OID 1 > > 2019-06-01 20:31:12.505 EDT [21910] DEBUG: MultiXactId wrap limit is > > 2147483648, limited by database with OID 1 > > 2019-06-01 20:31:12.505 EDT [21910] DEBUG: starting up replication slots > > 2019-06-01 20:31:12.505 EDT [21910] DEBUG: starting up replication > > origin progress state > > 2019-06-01 20:31:12.505 EDT [21910] PANIC: replication checkpoint has > > wrong magic 0 instead of 307747550 > > 2019-06-01 20:31:12.506 EDT [21908] DEBUG: reaping dead processes > > 2019-06-01 20:31:12.506 EDT [21908] LOG: startup process (PID 21910) > > was terminated by signal 6: Aborted > > 2019-06-01 20:31:12.506 EDT [21908] LOG: aborting startup due to > > startup process failure > > 2019-06-01 20:31:12.506 EDT [21908] DEBUG: shmem_exit(1): 0 > > before_shmem_exit callbacks to make > > 2019-06-01 20:31:12.506 EDT [21908] DEBUG: shmem_exit(1): 5 > > on_shmem_exit callbacks to make > > 2019-06-01 20:31:12.506 EDT [21908] DEBUG: cleaning up dynamic shared > > memory control segment with ID 976986759 > > 2019-06-01 20:31:12.508 EDT [21908] DEBUG: proc_exit(1): 2 callbacks to > > make > > 2019-06-01 20:31:12.508 EDT [21908] LOG: database system is shut down > > 2019-06-01 20:31:12.508 EDT [21908] DEBUG: exit(1) > > 2019-06-01 20:31:12.508 EDT [21908] DEBUG: shmem_exit(-1): 0 > > before_shmem_exit callbacks to make > > 2019-06-01 20:31:12.508 EDT [21908] DEBUG: shmem_exit(-1): 0 > > on_shmem_exit callbacks to make > > 2019-06-01 20:31:12.508 EDT [21908] DEBUG: proc_exit(-1): 0 callbacks > > to make > > 2019-06-01 20:31:12.510 EDT [21909] DEBUG: logger shutting down > > 2019-06-01 20:31:12.510 EDT [21909] DEBUG: shmem_exit(0): 0 > > before_shmem_exit callbacks to make > > 2019-06-01 20:31:12.510 EDT [21909] DEBUG: shmem_exit(0): 0 > > on_shmem_exit callbacks to make > > 2019-06-01 20:31:12.510 EDT [21909] DEBUG: proc_exit(0): 0 callbacks to > > make > > 2019-06-01 20:31:12.510 EDT [21909] DEBUG: exit(0) > > 2019-06-01 20:31:12.510 EDT [21909] DEBUG: shmem_exit(-1): 0 > > before_shmem_exit callbacks to make > > 2019-06-01 20:31:12.510 EDT [21909] DEBUG: shmem_exit(-1): 0 > > on_shmem_exit callbacks to make > > 2019-06-01 20:31:12.510 EDT [21909] DEBUG: proc_exit(-1): 0 callbacks > > to make > > > > -- > > Adrian Klaver > > adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com> > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >