Re: Trying to handle db corruption 9.6

Mariel Cherkassky Tue, 21 May 2019 06:18:01 -0700

Tomas - Well,  when I run the restore_command manually it works (archive
dir exists on the secondary..). Thank for the explanation on the system
catalogs..


Greg - My restore command copy the wals from archive dir in the primary to
an archive dir in the secondary(different from the pg_xlog in the
secondary). Should I run it manually and see if the archives are copied to
the archive dir in the secondary or should I just copy all of them to the
xlog dir in the secondary ?
I tried to start the secondary as a primary (I have a backup..) but I still
got an error (invalid checkpoint record from primary./ secondary). Does it
means that my backup is corrupted ?

‫בתאריך יום ג׳, 21 במאי 2019 ב-16:07 מאת ‪Tomas Vondra‬‏ <‪
tomas.von...@2ndquadrant.com‬‏>:‬

> On Tue, May 21, 2019 at 12:01:31PM +0300, Mariel Cherkassky wrote:
> >Tomas :
> >
> >Well, when you say it does not work, why do you think so? Does it print
> >some error, or what? Does it even get executed? It does not seem to be
> >the case, judging by the log (there's no archive_command message).
> >
> >How was the "secondary machine" created? You said you have all the WAL
> >since then - how do you know that?
> >
> >Well, when I start the secondary in recovery mode (the primary is down,
> >auto failover is disabled..)  it doesnt start recovering the archive wals
> >from the primary. The logs of the secondary :
> >receiving incremental file list
> >rsync: link_stat "/var/lib/pgsql/archive/00000002.history" failed: No such
> >file or directory (2)
> >
> >sent 8 bytes  received 10 bytes  36.00 bytes/sec
> >total size is 0  speedup is 0.00
> >rsync error: some files/attrs were not transferred (see previous errors)
> >(code 23) at main.c(1505) [receiver=3.0.6]
> >sh: /var/lib/pgsql/archive/00000002.history: No such file or directory
> >2019-05-20 09:41:33 PDT  18558  LOG:  entering standby mode
> >2019-05-20 09:41:33 PDT  18558  LOG:  invalid primary checkpoint record
> >2019-05-20 09:41:33 PDT  18558  LOG:  invalid secondary checkpoint link in
> >control file
> >2019-05-20 09:41:33 PDT  18558  PANIC:  could not locate a valid
> checkpoint
> >record
> >2019-05-20 09:41:33 PDT  18555  LOG:  startup process (PID 18558) was
> >terminated by signal 6: Aborted
> >2019-05-20 09:41:33 PDT  18555  LOG:  aborting startup due to startup
> >process failure
> >2019-05-20 09:41:33 PDT  18555  LOG:  database system is shut down
> >2019-05-20 09:56:12 PDT  18701  LOG:  database system was shut down in
> >recovery at 2019-05-01 09:40:02 PDT
> >
> >As I said, the secondary was down for a month and I have all the archives
> >of the wals in my primary. I was hoping that the secondary will use the
> >restore_command to restore them :
> >restore_command = 'rsync -avzhe ssh postgres@X.X.X.X
> :/var/lib/pgsql/archive/%f
> >/var/lib/pgsql/archive/%f ; gunzip < /var/lib/pgsql/archive/%f > %p'
> >
> >my archive_command on the primary was :
> >archive_command = 'gzip < %p > /var/lib/pgsql/archive/%f'
> >
> >Am I missing something ?
> >
>
> First of all, the way you quote message is damn confusing - there's no
> clear difference between your message and the message you quote. I don't
> know which mail client you're using, but I suppose it can be configured to
> quote sensibly ...
>
> Well, clearly the standby tries to fetch WAL from archive, but the rsync
> command fails for some reason. You're in the position to investigate
> further, because you can run it manually - we can't. This has nothing to
> do with PostgreSQL. My guess is you don't have /var/lib/pgsql/archive on
> the standby, and it's confusing because archive uses the same path.
>
>
> >Another question, If I'll run initdb and initiate a new cluster and i'll
> >copy the data files of my old cluster into the new one, is there any
> chance
> >that it will work ?
> >I mean right now, my primary is down and cant start up because it is
> >missing an offset file in the pg_multixtrans/offset dir.
> >
>
> No, because you won't have contents of system catalogs, mapping the data
> files to relations (tables, indexes) and containing information about the
> structure (which columns / data types are in the data).
>
> The data files are pretty useless on their own. It might be possible to do
> some manualy recovery - say, you might create the same tables in the new
> schema, and then guess which data files belong to them. But there are
> various caveats e.g. due to dropped columns, etc.
>
> --
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>

Re: Trying to handle db corruption 9.6

Reply via email to