Re: PANIC during crash recovery of a recently promoted standby

2018-07-05 Thread Michael Paquier
On Thu, Jul 05, 2018 at 01:03:14PM +0530, Pavan Deolasee wrote: > Many thanks Michael for doing the gruelling of coming up with a more > complete fix, verifying all the cases, in various back branches. No problem. I hope I got the credits right. If there is anything wrong please feel free to let

Re: PANIC during crash recovery of a recently promoted standby

2018-07-05 Thread Pavan Deolasee
On Thu, Jul 5, 2018 at 7:20 AM, Michael Paquier wrote: > On Mon, Jul 02, 2018 at 10:41:05PM +0900, Michael Paquier wrote: > > I am planning to finish wrapping this patch luckily on Wednesday JST > > time, or in the worst case on Thursday. I got this problem on my mind > > for a couple of days no

Re: PANIC during crash recovery of a recently promoted standby

2018-07-04 Thread Michael Paquier
On Mon, Jul 02, 2018 at 10:41:05PM +0900, Michael Paquier wrote: > I am planning to finish wrapping this patch luckily on Wednesday JST > time, or in the worst case on Thursday. I got this problem on my mind > for a couple of days now and I could not find a case where the approach > taken could ca

Re: PANIC during crash recovery of a recently promoted standby

2018-07-02 Thread Michael Paquier
On Mon, Jul 02, 2018 at 04:25:13PM +0900, Kyotaro HORIGUCHI wrote: > When minRecoveryPoint is invalid, there're only two possible > cases. It may be at very beginning of archive reovery or may be > running a crash recovery. In the latter case, we have detected > crash recovery before redo starts. S

Re: PANIC during crash recovery of a recently promoted standby

2018-07-02 Thread Kyotaro HORIGUCHI
Hello. At Fri, 22 Jun 2018 15:25:48 +0900, Michael Paquier wrote in <20180622062548.ge5...@paquier.xyz> > On Fri, Jun 22, 2018 at 02:34:02PM +0900, Kyotaro HORIGUCHI wrote: > > Hello, sorry for the absense and I looked the second patch. > > Thanks for the review! > > > At Fri, 22 Jun 2018 13:4

Re: PANIC during crash recovery of a recently promoted standby

2018-06-27 Thread Michael Paquier
Adding Heikki and Andres in CC here for awareness.. On Wed, Jun 27, 2018 at 05:29:38PM +0900, Michael Paquier wrote: > I have spent a bit of time testing this on HEAD, 10 and 9.6. For 9.5, > 9.4 and 9.3 I have reproduced the failure and tested the patch, but I > lacked time to perform more tests.

Re: PANIC during crash recovery of a recently promoted standby

2018-06-27 Thread Michael Paquier
On Fri, Jun 22, 2018 at 03:25:48PM +0900, Michael Paquier wrote: > On Fri, Jun 22, 2018 at 02:34:02PM +0900, Kyotaro HORIGUCHI wrote: >> Hello, sorry for the absense and I looked the second patch. > > Thanks for the review! I have been spending some time testing and torturing the patch for all st

Re: PANIC during crash recovery of a recently promoted standby

2018-06-21 Thread Michael Paquier
On Fri, Jun 22, 2018 at 02:34:02PM +0900, Kyotaro HORIGUCHI wrote: > Hello, sorry for the absense and I looked the second patch. Thanks for the review! > At Fri, 22 Jun 2018 13:45:21 +0900, Michael Paquier > wrote in <20180622044521.gc5...@paquier.xyz> >> long as crash recovery runs. And XLogNe

Re: PANIC during crash recovery of a recently promoted standby

2018-06-21 Thread Kyotaro HORIGUCHI
Hello, sorry for the absense and I looked the second patch. At Fri, 22 Jun 2018 13:45:21 +0900, Michael Paquier wrote in <20180622044521.gc5...@paquier.xyz> > On Fri, Jun 22, 2018 at 10:08:24AM +0530, Pavan Deolasee wrote: > > On Fri, Jun 22, 2018 at 9:28 AM, Michael Paquier > > wrote: > >> So

Re: PANIC during crash recovery of a recently promoted standby

2018-06-21 Thread Michael Paquier
On Fri, Jun 22, 2018 at 10:08:24AM +0530, Pavan Deolasee wrote: > On Fri, Jun 22, 2018 at 9:28 AM, Michael Paquier > wrote: >> So an extra pair of eyes from another committer would be >> welcome. I am letting that cool down for a couple of days now. > > I am not a committer, so don't know if my

Re: PANIC during crash recovery of a recently promoted standby

2018-06-21 Thread Pavan Deolasee
On Fri, Jun 22, 2018 at 9:28 AM, Michael Paquier wrote: > > > This is not really a complicated patch, and it took a lot of energy from > me the last couple of days per the nature of the many scenarios to think > about... Thanks for the efforts. It wasn't an easy bug to chase to begin with. So I

Re: PANIC during crash recovery of a recently promoted standby

2018-06-21 Thread Michael Paquier
On Thu, Jun 07, 2018 at 07:58:29PM +0900, Kyotaro HORIGUCHI wrote: > (I believe that) By definition recovery doesn't end until the > end-of-recovery check point ends so from the viewpoint I think it > is wrong to clear ControlFile->minRecoveryPoint before the end. > > Invalid-page checking during

Re: PANIC during crash recovery of a recently promoted standby

2018-06-19 Thread Michael Paquier
On Thu, Jun 07, 2018 at 07:58:29PM +0900, Kyotaro HORIGUCHI wrote: > Invalid-page checking during crash recovery is hamful rather than > useless. It is done by CheckRecoveryConsistency even in crash > recovery against its expectation because there's a case where > minRecoveryPoint is valid but InAr

Re: PANIC during crash recovery of a recently promoted standby

2018-06-07 Thread Kyotaro HORIGUCHI
Hello. At Thu, 24 May 2018 16:57:07 +0900, Michael Paquier wrote in <20180524075707.ge15...@paquier.xyz> > On Mon, May 14, 2018 at 01:14:22PM +0530, Pavan Deolasee wrote: > > Looks like I didn't understand Alvaro's comment when he mentioned it to me > > off-list. But I now see what Michael and A

Re: PANIC during crash recovery of a recently promoted standby

2018-05-24 Thread Michael Paquier
On Mon, May 14, 2018 at 01:14:22PM +0530, Pavan Deolasee wrote: > Looks like I didn't understand Alvaro's comment when he mentioned it to me > off-list. But I now see what Michael and Alvaro mean and that indeed seems > like a problem. I was thinking that the test for (ControlFile->state == > DB_IN

Re: PANIC during crash recovery of a recently promoted standby

2018-05-14 Thread Pavan Deolasee
On Fri, May 11, 2018 at 8:39 PM, Alvaro Herrera wrote: > Michael Paquier wrote: > > On Thu, May 10, 2018 at 10:52:12AM +0530, Pavan Deolasee wrote: > > > I propose that we should always clear the minRecoveryPoint after > promotion > > > to ensure that crash recovery always run to the end if a jus

Re: PANIC during crash recovery of a recently promoted standby

2018-05-13 Thread Michael Paquier
On Sat, May 12, 2018 at 07:41:33AM +0900, Michael Paquier wrote: > pg_ctl promote would wait for the control file to be updated, so you > cannot use it in the TAP tests to trigger the promotion. Still I think > I found one after waking up? Please note I have not tested it: > - Use a custom trigge

Re: PANIC during crash recovery of a recently promoted standby

2018-05-11 Thread Michael Paquier
On Fri, May 11, 2018 at 12:09:58PM -0300, Alvaro Herrera wrote: > Yeah, I had this exact comment, but I was unable to come up with a test > case that would cause a problem. pg_ctl promote would wait for the control file to be updated, so you cannot use it in the TAP tests to trigger the promotion.

Re: PANIC during crash recovery of a recently promoted standby

2018-05-11 Thread Alvaro Herrera
Michael Paquier wrote: > On Thu, May 10, 2018 at 10:52:12AM +0530, Pavan Deolasee wrote: > > I propose that we should always clear the minRecoveryPoint after promotion > > to ensure that crash recovery always run to the end if a just-promoted > > standby crashes before completing its first regular

Re: PANIC during crash recovery of a recently promoted standby

2018-05-10 Thread Michael Paquier
On Thu, May 10, 2018 at 10:52:12AM +0530, Pavan Deolasee wrote: > I propose that we should always clear the minRecoveryPoint after promotion > to ensure that crash recovery always run to the end if a just-promoted > standby crashes before completing its first regular checkpoint. A WIP patch > is at