On Sat, Jun 1, 2019 at 3:32 PM Tom K <tomk...@gmail.com> wrote: > > > On Sat, Jun 1, 2019 at 9:55 AM Adrian Klaver <adrian.kla...@aklaver.com> > wrote: > >> On 5/31/19 7:53 PM, Tom K wrote: >> > >> >> > There are two places to connect with the Patroni community: on >> github, >> > via Issues and PRs, and on channel #patroni in the PostgreSQL >> Slack. If >> > you're using Patroni, or just interested, please join us. >> > >> > >> > Will post there as well. Thank you. My thinking was to post here >> first >> > since I suspect the Patroni community will simply refer me back here >> > given that the PostgreSQL errors are originating directly from >> PostgreSQL. >> > >> > >> > That being said, can you start the copied Postgres instance without >> > using the Patroni instrumentation? >> > >> > >> > Yes, that is something I have been trying to do actually. But I hit a >> > dead end with the three errors above. >> > >> > So what I did is to copy a single node's backed up copy of the data >> > files to */data/patroni* of the same node ( this is the psql data >> > directory as defined through patroni ) of the same node then ran this ( >> > psql03 = 192.168.0.118 ): >> > >> > # sudo su - postgres >> > $ /usr/pgsql-10/bin/postgres -D /data/patroni >> > --config-file=/data/patroni/postgresql.conf >> > --listen_addresses=192.168.0.118 --max_worker_processes=8 >> > --max_locks_per_transaction=64 --wal_level=replica >> > --track_commit_timestamp=off --max_prepared_transactions=0 --port=5432 >> > --max_replication_slots=10 --max_connections=100 --hot_standby=on >> > --cluster_name=postgres --wal_log_hints=on --max_wal_senders=10 -d 5 >> >> Why all the options? >> That should be covered in postgresql.conf, no? >> >> > >> > This resulted in one of the 3 messages above. Hence the post here. If >> > I can start a single instance, I should be fine since I could then 1) >> > replicate over to the other two or 2) simply take a dump, reinitialize >> > all the databases then restore the dump. >> > >> >> What if you move the recovery.conf file out? > > > Will try. > > >> >> The below looks like missing/corrupted/incorrect files. Hard to tell >> without knowing what Patroni did? > > > Storage disappeared from underneath these clusters. The OS was of course > still in memory making futile attempts to write to disk, which would never > complete. > > My best guess is that Patroni or postgress was in the middle of some > writes across the clusters when the failure occurred. >
Of note are the characters f2W below. I see nothing in the postgres source code to indicate this is any recognizable postgres message. A part of me suspects that the postgres binaries got corrupted. Had this case occur with glib-common and a reinstall fixed it. However the postgres binaries csum matches a standalone install perfectly so that should not be an issue. > >> >> > Using the above procedure I get one of three error messages when using >> > the data files of each node: >> > >> > [ PSQL01 ] >> > postgres: postgres: startup process waiting for 000000010000000000000008 >> > >> > [ PSQL02 ] >> > PANIC:replicationcheckpointhas wrong magic 0 instead of 307747550 >> > >> > [ PSQL03 } >> > FATAL:syntax error inhistory file:f2W >> > >> > And I can't start any one of them. >> > >> > >> > >> > > >> > > Thx, >> > > TK >> > > >> > >> > >> > >> > -- >> > Adrian Klaver >> > adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com> >> > >> >> >> -- >> Adrian Klaver >> adrian.kla...@aklaver.com >> >