On Sat, Jun 1, 2019 at 8:53 PM Adrian Klaver <adrian.kla...@aklaver.com>
wrote:

> On 6/1/19 5:32 PM, Tom K wrote:
>
> >
> >
> > Trying what we did above but on the second node:
> >
>
> Was this node the primary?
>
> To me the below looks like there are replication slots set up that are
> failing. Not sure how to deal with this at the moment. You might try
> single-user mode:
>
> https://www.postgresql.org/docs/10/app-postgres.html
> Single-User Mode
> ...
>
> and see if that at least gets the server started. This is a highly
> restricted so do not expect much usability.
>

These servers did crash before however didn't' notice anything particularly
wrong at the time. Perhaps a better check was in order.

Here is the result from attempting to start the DB in single user mode.

-bash-4.2$ /usr/pgsql-10/bin/postgres --single -D /data/patroni
--config-file=/data/patroni/postgresql.conf --hot_standby=off
--listen_addresses=192.168.0.124 --max_worker_processes=8
--max_locks_per_transaction=64 --wal_level=replica --cluster_name=postgres
--wal_log_hints=on --max_wal_senders=10 --track_commit_timestamp=off
--max_prepared_transactions=0 --port=5432 --max_replication_slots=10
--max_connections=20 -d 5 2>&1
2019-06-01 23:04:20.876 EDT [31831] DEBUG:  invoking
IpcMemoryCreate(size=144687104)
2019-06-01 23:04:20.877 EDT [31831] DEBUG:  mmap(144703488) with
MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
2019-06-01 23:04:20.890 EDT [31831] DEBUG:  SlruScanDirectory invoking
callback on pg_notify/0000
2019-06-01 23:04:20.890 EDT [31831] DEBUG:  removing file "pg_notify/0000"
2019-06-01 23:04:20.890 EDT [31831] DEBUG:  dynamic shared memory system
will support 128 segments
2019-06-01 23:04:20.891 EDT [31831] DEBUG:  created dynamic shared memory
control segment 846930886 (3088 bytes)
2019-06-01 23:04:20.891 EDT [31831] DEBUG:  InitPostgres
2019-06-01 23:04:20.891 EDT [31831] DEBUG:  my backend ID is 1
2019-06-01 23:04:20.891 EDT [31831] LOG:  database system was interrupted;
last known up at 2019-04-28 06:06:24 EDT
2019-06-01 23:04:21.239 EDT [31831] LOG:  invalid record length at
0/4C35CDF8: wanted 24, got 0
2019-06-01 23:04:21.239 EDT [31831] LOG:  invalid primary checkpoint record
2019-06-01 23:04:21.239 EDT [31831] LOG:  using previous checkpoint record
at 0/4C34EDA8
2019-06-01 23:04:21.239 EDT [31831] DEBUG:  redo record is at 0/4C34ED70;
shutdown FALSE
2019-06-01 23:04:21.239 EDT [31831] DEBUG:  next transaction ID: 0:1409831;
next OID: 237578
2019-06-01 23:04:21.239 EDT [31831] DEBUG:  next MultiXactId: 48; next
MultiXactOffset: 174
2019-06-01 23:04:21.239 EDT [31831] DEBUG:  oldest unfrozen transaction ID:
549, in database 1
2019-06-01 23:04:21.239 EDT [31831] DEBUG:  oldest MultiXactId: 1, in
database 1
2019-06-01 23:04:21.239 EDT [31831] DEBUG:  commit timestamp Xid
oldest/newest: 0/0
2019-06-01 23:04:21.239 EDT [31831] DEBUG:  transaction ID wrap limit is
2147484196, limited by database with OID 1
2019-06-01 23:04:21.239 EDT [31831] DEBUG:  MultiXactId wrap limit is
2147483648, limited by database with OID 1
2019-06-01 23:04:21.239 EDT [31831] DEBUG:  starting up replication slots
2019-06-01 23:04:21.239 EDT [31831] DEBUG:  starting up replication origin
progress state
2019-06-01 23:04:21.239 EDT [31831] PANIC:  replication checkpoint has
wrong magic 0 instead of 307747550
Aborted
-bash-4.2$
-bash-4.2$ /usr/pgsql-10/bin/postgres --single -D /data/patroni
--config-file=/data/patroni/postgresql.conf
2019-06-01 23:06:28.679 EDT [31969] LOG:  database system was interrupted;
last known up at 2019-04-28 06:06:24 EDT
2019-06-01 23:06:28.953 EDT [31969] LOG:  invalid record length at
0/4C35CDF8: wanted 24, got 0
2019-06-01 23:06:28.953 EDT [31969] LOG:  invalid primary checkpoint record
2019-06-01 23:06:28.953 EDT [31969] LOG:  using previous checkpoint record
at 0/4C34EDA8
2019-06-01 23:06:28.953 EDT [31969] PANIC:  replication checkpoint has
wrong magic 0 instead of 307747550
Aborted
-bash-4.2$



>
> > Removing the recovery.conf and restarting yields the following messages:
> >
> > 2019-06-01 20:31:12.231 EDT [21910] LOG:  database system was
> > interrupted; last known up at 2019-04-28 06:06:24 EDT
> > 2019-06-01 20:31:12.505 EDT [21910] LOG:  invalid record length at
> > 0/4C35CDF8: wanted 24, got 0
> > 2019-06-01 20:31:12.505 EDT [21910] LOG:  invalid primary checkpoint
> record
> > 2019-06-01 20:31:12.505 EDT [21910] LOG:  using previous checkpoint
> > record at 0/4C34EDA8
> > 2019-06-01 20:31:12.505 EDT [21910] DEBUG:  redo record is at
> > 0/4C34ED70; shutdown FALSE
> > 2019-06-01 20:31:12.505 EDT [21910] DEBUG:  next transaction ID:
> > 0:1409831; next OID: 237578
> > 2019-06-01 20:31:12.505 EDT [21910] DEBUG:  next MultiXactId: 48; next
> > MultiXactOffset: 174
> > 2019-06-01 20:31:12.505 EDT [21910] DEBUG:  oldest unfrozen transaction
> > ID: 549, in database 1
> > 2019-06-01 20:31:12.505 EDT [21910] DEBUG:  oldest MultiXactId: 1, in
> > database 1
> > 2019-06-01 20:31:12.505 EDT [21910] DEBUG:  commit timestamp Xid
> > oldest/newest: 0/0
> > 2019-06-01 20:31:12.505 EDT [21910] DEBUG:  transaction ID wrap limit is
> > 2147484196, limited by database with OID 1
> > 2019-06-01 20:31:12.505 EDT [21910] DEBUG:  MultiXactId wrap limit is
> > 2147483648, limited by database with OID 1
> > 2019-06-01 20:31:12.505 EDT [21910] DEBUG:  starting up replication slots
> > 2019-06-01 20:31:12.505 EDT [21910] DEBUG:  starting up replication
> > origin progress state
> > 2019-06-01 20:31:12.505 EDT [21910] PANIC:  replication checkpoint has
> > wrong magic 0 instead of 307747550
> > 2019-06-01 20:31:12.506 EDT [21908] DEBUG:  reaping dead processes
> > 2019-06-01 20:31:12.506 EDT [21908] LOG:  startup process (PID 21910)
> > was terminated by signal 6: Aborted
> > 2019-06-01 20:31:12.506 EDT [21908] LOG:  aborting startup due to
> > startup process failure
> > 2019-06-01 20:31:12.506 EDT [21908] DEBUG:  shmem_exit(1): 0
> > before_shmem_exit callbacks to make
> > 2019-06-01 20:31:12.506 EDT [21908] DEBUG:  shmem_exit(1): 5
> > on_shmem_exit callbacks to make
> > 2019-06-01 20:31:12.506 EDT [21908] DEBUG:  cleaning up dynamic shared
> > memory control segment with ID 976986759
> > 2019-06-01 20:31:12.508 EDT [21908] DEBUG:  proc_exit(1): 2 callbacks to
> > make
> > 2019-06-01 20:31:12.508 EDT [21908] LOG:  database system is shut down
> > 2019-06-01 20:31:12.508 EDT [21908] DEBUG:  exit(1)
> > 2019-06-01 20:31:12.508 EDT [21908] DEBUG:  shmem_exit(-1): 0
> > before_shmem_exit callbacks to make
> > 2019-06-01 20:31:12.508 EDT [21908] DEBUG:  shmem_exit(-1): 0
> > on_shmem_exit callbacks to make
> > 2019-06-01 20:31:12.508 EDT [21908] DEBUG:  proc_exit(-1): 0 callbacks
> > to make
> > 2019-06-01 20:31:12.510 EDT [21909] DEBUG:  logger shutting down
> > 2019-06-01 20:31:12.510 EDT [21909] DEBUG:  shmem_exit(0): 0
> > before_shmem_exit callbacks to make
> > 2019-06-01 20:31:12.510 EDT [21909] DEBUG:  shmem_exit(0): 0
> > on_shmem_exit callbacks to make
> > 2019-06-01 20:31:12.510 EDT [21909] DEBUG:  proc_exit(0): 0 callbacks to
> > make
> > 2019-06-01 20:31:12.510 EDT [21909] DEBUG:  exit(0)
> > 2019-06-01 20:31:12.510 EDT [21909] DEBUG:  shmem_exit(-1): 0
> > before_shmem_exit callbacks to make
> > 2019-06-01 20:31:12.510 EDT [21909] DEBUG:  shmem_exit(-1): 0
> > on_shmem_exit callbacks to make
> > 2019-06-01 20:31:12.510 EDT [21909] DEBUG:  proc_exit(-1): 0 callbacks
> > to make
> >
> >             --
> >             Adrian Klaver
> >             adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Reply via email to