Tomas - Well, when I run the restore_command manually it works (archive dir exists on the secondary..). Thank for the explanation on the system catalogs..
Greg - My restore command copy the wals from archive dir in the primary to an archive dir in the secondary(different from the pg_xlog in the secondary). Should I run it manually and see if the archives are copied to the archive dir in the secondary or should I just copy all of them to the xlog dir in the secondary ? I tried to start the secondary as a primary (I have a backup..) but I still got an error (invalid checkpoint record from primary./ secondary). Does it means that my backup is corrupted ? בתאריך יום ג׳, 21 במאי 2019 ב-16:07 מאת Tomas Vondra < tomas.von...@2ndquadrant.com>: > On Tue, May 21, 2019 at 12:01:31PM +0300, Mariel Cherkassky wrote: > >Tomas : > > > >Well, when you say it does not work, why do you think so? Does it print > >some error, or what? Does it even get executed? It does not seem to be > >the case, judging by the log (there's no archive_command message). > > > >How was the "secondary machine" created? You said you have all the WAL > >since then - how do you know that? > > > >Well, when I start the secondary in recovery mode (the primary is down, > >auto failover is disabled..) it doesnt start recovering the archive wals > >from the primary. The logs of the secondary : > >receiving incremental file list > >rsync: link_stat "/var/lib/pgsql/archive/00000002.history" failed: No such > >file or directory (2) > > > >sent 8 bytes received 10 bytes 36.00 bytes/sec > >total size is 0 speedup is 0.00 > >rsync error: some files/attrs were not transferred (see previous errors) > >(code 23) at main.c(1505) [receiver=3.0.6] > >sh: /var/lib/pgsql/archive/00000002.history: No such file or directory > >2019-05-20 09:41:33 PDT 18558 LOG: entering standby mode > >2019-05-20 09:41:33 PDT 18558 LOG: invalid primary checkpoint record > >2019-05-20 09:41:33 PDT 18558 LOG: invalid secondary checkpoint link in > >control file > >2019-05-20 09:41:33 PDT 18558 PANIC: could not locate a valid > checkpoint > >record > >2019-05-20 09:41:33 PDT 18555 LOG: startup process (PID 18558) was > >terminated by signal 6: Aborted > >2019-05-20 09:41:33 PDT 18555 LOG: aborting startup due to startup > >process failure > >2019-05-20 09:41:33 PDT 18555 LOG: database system is shut down > >2019-05-20 09:56:12 PDT 18701 LOG: database system was shut down in > >recovery at 2019-05-01 09:40:02 PDT > > > >As I said, the secondary was down for a month and I have all the archives > >of the wals in my primary. I was hoping that the secondary will use the > >restore_command to restore them : > >restore_command = 'rsync -avzhe ssh postgres@X.X.X.X > :/var/lib/pgsql/archive/%f > >/var/lib/pgsql/archive/%f ; gunzip < /var/lib/pgsql/archive/%f > %p' > > > >my archive_command on the primary was : > >archive_command = 'gzip < %p > /var/lib/pgsql/archive/%f' > > > >Am I missing something ? > > > > First of all, the way you quote message is damn confusing - there's no > clear difference between your message and the message you quote. I don't > know which mail client you're using, but I suppose it can be configured to > quote sensibly ... > > Well, clearly the standby tries to fetch WAL from archive, but the rsync > command fails for some reason. You're in the position to investigate > further, because you can run it manually - we can't. This has nothing to > do with PostgreSQL. My guess is you don't have /var/lib/pgsql/archive on > the standby, and it's confusing because archive uses the same path. > > > >Another question, If I'll run initdb and initiate a new cluster and i'll > >copy the data files of my old cluster into the new one, is there any > chance > >that it will work ? > >I mean right now, my primary is down and cant start up because it is > >missing an offset file in the pg_multixtrans/offset dir. > > > > No, because you won't have contents of system catalogs, mapping the data > files to relations (tables, indexes) and containing information about the > structure (which columns / data types are in the data). > > The data files are pretty useless on their own. It might be possible to do > some manualy recovery - say, you might create the same tables in the new > schema, and then guess which data files belong to them. But there are > various caveats e.g. due to dropped columns, etc. > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > >