On Sat, Jun 1, 2019 at 3:32 PM Tom K <tomk...@gmail.com> wrote:

>
>
> On Sat, Jun 1, 2019 at 9:55 AM Adrian Klaver <adrian.kla...@aklaver.com>
> wrote:
>
>> On 5/31/19 7:53 PM, Tom K wrote:
>> >
>>
>> >     There are two places to connect with the Patroni community: on
>> github,
>> >     via Issues and PRs, and on channel #patroni in the PostgreSQL
>> Slack. If
>> >     you're using Patroni, or just interested, please join us.
>> >
>> >
>> > Will post there as well.  Thank you.  My thinking was to post here
>> first
>> > since I suspect the Patroni community will simply refer me back here
>> > given that the PostgreSQL errors are originating directly from
>> PostgreSQL.
>> >
>> >
>> >     That being said, can you start the copied Postgres instance without
>> >     using the Patroni instrumentation?
>> >
>> >
>> > Yes, that is something I have been trying to do actually.  But I hit a
>> > dead end with the three errors above.
>> >
>> > So what I did is to copy a single node's backed up copy of the data
>> > files to */data/patroni* of the same node ( this is the psql data
>> > directory as defined through patroni ) of the same node then ran this (
>> > psql03 = 192.168.0.118 ):
>> >
>> > # sudo su - postgres
>> > $ /usr/pgsql-10/bin/postgres -D /data/patroni
>> > --config-file=/data/patroni/postgresql.conf
>> > --listen_addresses=192.168.0.118 --max_worker_processes=8
>> > --max_locks_per_transaction=64 --wal_level=replica
>> > --track_commit_timestamp=off --max_prepared_transactions=0 --port=5432
>> > --max_replication_slots=10 --max_connections=100 --hot_standby=on
>> > --cluster_name=postgres --wal_log_hints=on --max_wal_senders=10 -d 5
>>
>> Why all the options?
>> That should be covered in postgresql.conf, no?
>>
>> >
>> > This resulted in one of the 3 messages above.  Hence the post here.  If
>> > I can start a single instance, I should be fine since I could then 1)
>> > replicate over to the other two or 2) simply take a dump, reinitialize
>> > all the databases then restore the dump.
>> >
>>
>> What if you move the recovery.conf file out?
>
>
> Will try.
>
>
>>
>> The below looks like missing/corrupted/incorrect files. Hard to tell
>> without knowing what Patroni did?
>
>
> Storage disappeared from underneath these clusters.  The OS was of course
> still in memory making futile attempts to write to disk, which would never
> complete.
>
> My best guess is that Patroni or postgress was in the middle of some
> writes across the clusters when the failure occurred.
>

Of note are the characters f2W below.  I see nothing in the postgres source
code to indicate this is any recognizable postgres message.  A part of me
suspects that the postgres binaries got corrupted.   Had this case occur
with glib-common and a reinstall fixed it.  However the postgres binaries
csum matches a standalone install perfectly so that should not be an issue.



>
>>
>> > Using the above procedure I get one of three error messages when using
>> > the data files of each node:
>> >
>> > [ PSQL01 ]
>> > postgres: postgres: startup process waiting for 000000010000000000000008
>> >
>> > [ PSQL02 ]
>> > PANIC:replicationcheckpointhas wrong magic 0 instead of  307747550
>> >
>> > [ PSQL03 }
>> > FATAL:syntax error inhistory file:f2W
>> >
>> > And I can't start any one of them.
>> >
>> >
>> >
>> >      >
>> >      > Thx,
>> >      > TK
>> >      >
>> >
>> >
>> >
>> >     --
>> >     Adrian Klaver
>> >     adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>
>> >
>>
>>
>> --
>> Adrian Klaver
>> adrian.kla...@aklaver.com
>>
>

Reply via email to