RE: Unable to start replica after failover

2022-09-06 Thread Lahnov, Igor
As far as I understand, according to the logs, the last leader does not yet know about the new timeline and it is trying to download the full log from the previous timeline. It seems there should be a conflict that the partial file already exists locally when restoring in this case, but this does

Re: Unable to start replica after failover

2022-09-06 Thread Alexander Kukushkin
On Tue, Sep 6, 2022, 08:46 Lahnov, Igor wrote: > What do you think it is possible to add a check to the restore command, > that a partial or full file already exists? > > Or is disabling the restore command a possible solution in this case? > My opinion didn't change, pg_probackup does a weird t

Re: Unable to start replica after failover

2022-08-30 Thread Alexander Kukushkin
Hi Igor, On Fri, 26 Aug 2022 at 13:43, Lahnov, Igor wrote: > I can't answer your question. What do you think, the recovery from the > archive should work in this case? > > 1. the partial file should not be restored at all cases > > 2. the partial file should recover as a partial and replace the

RE: Unable to start replica after failover

2022-08-26 Thread Lahnov, Igor
I can't answer your question. What do you think, the recovery from the archive should work in this case? 1. the partial file should not be restored at all cases 2. the partial file should recover as a partial and replace the local partial 3. recovery command, should return a conflict - file alread

Re: Unable to start replica after failover

2022-08-24 Thread Alexander Kukushkin
Hi, On Wed, 24 Aug 2022 at 13:37, Lahnov, Igor wrote: > > > Yes, Postgres asks for 0002054E00FB and gets renamed > 0002054E00FB.partial (without *partial* postfix). > But why? This is totally weird and unexpected behavior. Why pg_probackup is doing this? Regards, -- Al

RE: Unable to start replica after failover

2022-08-24 Thread Lahnov, Igor
Hi, Yes, the *patial* from the *new leader* is restored to *last leader* and renamed to 0002054E00FB, without *partial* postfix. >>Postgres asks for file 0002054E00FB but somehow gets >>0002054E00FB.partial instead. Why? Yes, Postgres asks for 0002054E

Re: Unable to start replica after failover

2022-08-23 Thread Alexander Kukushkin
Hi, On Tue, 23 Aug 2022 at 16:31, Lahnov, Igor wrote: > > Our 'restore_command' on *previous leader* restores a partial file from > archive (from *new leader*) > > -> > 2022-05-23 01:50:14 [123730]: [1-1]: INFO: pg_probackup archive-get WAL > file: 0002054E00FB, remote: ssh, threads:

RE: Unable to start replica after failover

2022-08-23 Thread Lahnov, Igor
We know what the problem is, but don't know how to solve it correctly. After failover, *new leader* promoted and read local partial log to LSN 54E/FB348118 -> 2022-05-23 01:47:52.124 [12088] LOG: record with incorrect prev-link 0/100 at 54E/FB348118 2022-05-23 01:47:52.124 [12088] LOG: redo don

Re: Unable to start replica after failover

2022-08-22 Thread Kyotaro Horiguchi
> What additional information is needed? Usually server logs and the output of pg_rewind at the trouble time are needed as the first step. > Next, pg_rewind returns errors while reading the log from the backup > back, looking for the last checkpoint, which is quite reasonable > because, once a ne

RE: Unable to start replica after failover

2022-08-15 Thread Lahnov, Igor
What additional information is needed?    

Re: Unable to start replica after failover

2022-08-04 Thread Kyotaro Horiguchi
At Fri, 29 Jul 2022 15:01:44 +, "Lahnov, Igor" wrote in > * "could not find previous WAL record at E6F/C2436F50: invalid > resource manager ID 139 at E6F/C2436F50"; or > * "could not find previous WAL record at 54E/FB348118: unexpected > pageaddr 54E/7B34A000 in log segment