Re: Concurrency issue in pg_rewind

2020-10-07 Thread Stephen Frost
Greetings, * Andrey M. Borodin (x4...@yandex-team.ru) wrote: > > 18 сент. 2020 г., в 11:59, Michael Paquier написал(а): > > On Fri, Sep 18, 2020 at 11:31:26AM +0500, Andrey M. Borodin wrote: > >> This is whole point of having prefetch. restore_command just links > >> file from the same partition.

Re: Concurrency issue in pg_rewind

2020-10-07 Thread Stephen Frost
Greetings, * Heikki Linnakangas (hlinn...@iki.fi) wrote: > On 18/09/2020 10:17, Alexander Kukushkin wrote: > >At the same time, pg_rewind due to such "fatal" error leaves PGDATA in > >an inconsistent state with empty pg_control file, this is totally bad > >and easily fixable. We want the specific

Re: Concurrency issue in pg_rewind

2020-09-28 Thread Heikki Linnakangas
On 18/09/2020 10:17, Alexander Kukushkin wrote: At the same time, pg_rewind due to such "fatal" error leaves PGDATA in an inconsistent state with empty pg_control file, this is totally bad and easily fixable. We want the specific file to be absent and it is already absent, why should it be a fata

Re: Concurrency issue in pg_rewind

2020-09-18 Thread Andrey M. Borodin
> 18 сент. 2020 г., в 11:59, Michael Paquier написал(а): > > On Fri, Sep 18, 2020 at 11:31:26AM +0500, Andrey M. Borodin wrote: >> This is whole point of having prefetch. restore_command just links >> file from the same partition. > > If this stuff is willing to do so, you may have your reaso

Re: Concurrency issue in pg_rewind

2020-09-18 Thread Oleksandr Shulgin
On Fri, Sep 18, 2020 at 8:10 AM Michael Paquier wrote: > On Thu, Sep 17, 2020 at 10:20:16AM +0200, Oleksandr Shulgin wrote: > > Ouch. I think pg_rewind shouldn't try to remove any random files in > pg_wal > > that it doesn't know about. > > What if the administrator made a backup of some WAL seg

Re: Concurrency issue in pg_rewind

2020-09-18 Thread Alexander Kukushkin
Hi, On Fri, 18 Sep 2020 at 08:59, Michael Paquier wrote: > If this stuff is willing to do so, you may have your reasons, but even > if you wish to locate both pg_wal/ and the prefetch path in the same > partition, I don't get why it is necessary to have the prefetch path > included directly in p

Re: Concurrency issue in pg_rewind

2020-09-18 Thread Michael Paquier
On Fri, Sep 18, 2020 at 11:31:26AM +0500, Andrey M. Borodin wrote: > This is whole point of having prefetch. restore_command just links > file from the same partition. If this stuff is willing to do so, you may have your reasons, but even if you wish to locate both pg_wal/ and the prefetch path in

Re: Concurrency issue in pg_rewind

2020-09-17 Thread Andrey M. Borodin
> 18 сент. 2020 г., в 11:10, Michael Paquier написал(а): > > On Thu, Sep 17, 2020 at 10:20:16AM +0200, Oleksandr Shulgin wrote: >> Ouch. I think pg_rewind shouldn't try to remove any random files in pg_wal >> that it doesn't know about. >> What if the administrator made a backup of some WAL s

Re: Concurrency issue in pg_rewind

2020-09-17 Thread Michael Paquier
On Thu, Sep 17, 2020 at 10:20:16AM +0200, Oleksandr Shulgin wrote: > Ouch. I think pg_rewind shouldn't try to remove any random files in pg_wal > that it doesn't know about. > What if the administrator made a backup of some WAL segments there? IMO, this would be a rather bad strategy anyway, so j

Re: Concurrency issue in pg_rewind

2020-09-17 Thread Alexey Kondratov
On 2020-09-17 15:27, Alexander Kukushkin wrote: On Thu, 17 Sep 2020 at 14:04, Alexey Kondratov wrote: With --restore-target-wal pg_rewind is trying to call restore_command on its own and it can happen at two stages: 1) When pg_rewind is trying to find the last checkpoint preceding a divergen

Re: Concurrency issue in pg_rewind

2020-09-17 Thread Alexander Kukushkin
On Thu, 17 Sep 2020 at 14:04, Alexey Kondratov wrote: > > Hm, I cannot understand why wal-g (or any other tool) is trying to run > pg_rewind, while WAL copying (and prefetch) is still in progress? Why do > not just wait until it is finished? wal-g doesn't try to call pg_rewind. First, we called w

Re: Concurrency issue in pg_rewind

2020-09-17 Thread Alexey Kondratov
On 2020-09-16 15:55, Alexander Kukushkin wrote: Hello, Today I bumped into an issue with pg_rewind which is not 100% clear where should be better fixed. The first call of pg_rewind failed with the following message: servers diverged at WAL location A76/39E55338 on timeline 132 could not open fi

Re: Concurrency issue in pg_rewind

2020-09-17 Thread Oleksandr Shulgin
On Wed, Sep 16, 2020 at 2:55 PM Alexander Kukushkin wrote: > > The second time pg_rewind also failed, but the error looked differently: > servers diverged at WAL location A76/39E55338 on timeline 132 > rewinding from last common checkpoint at A76/1EF254B8 on timeline 132 > > could not remove file

Concurrency issue in pg_rewind

2020-09-16 Thread Alexander Kukushkin
Hello, Today I bumped into an issue with pg_rewind which is not 100% clear where should be better fixed. The first call of pg_rewind failed with the following message: servers diverged at WAL location A76/39E55338 on timeline 132 could not open file "/home/postgres/pgdata/pgroot/data/pg_wal/0