Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Tom Lane
Andres Freund writes: > What I dislike with what you committed is that the state you're > investigating during the pause isn't the one youre going to end up > recoveryApply == true. That seems dangerous to me, even if its going to > be reworked in HEAD. Agreed, but it's been like that since the p

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Jeff Janes
On Wed, Dec 5, 2012 at 11:17 AM, Tom Lane wrote: > Jeff Janes writes: >> Right now if I'm doing a PITR and want to look around before blessing >> the restore, I have to: >> [ do painful stuff ] > > Yeah. The worst thing about this is the cost of stepping too far > forward, but I doubt we can do

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Andres Freund
On 2012-12-05 18:35:47 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2012-12-05 16:15:38 -0500, Tom Lane wrote: > >> That's fine, but the immediate question is what are we doing to fix > >> the back branches. I think everyone is clear that we should be testing > >> LocalHotStandbyActive r

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Simon Riggs
On 5 December 2012 22:23, Tom Lane wrote: > Robert Haas writes: >> On Wed, Dec 5, 2012 at 4:15 PM, Tom Lane wrote: >>> The argument for this is that although we might fetch a slightly stale >>> value of the shared variable, it can't be very stale --- certainly no >>> older than the spinlock acqu

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Tom Lane
Andres Freund writes: > On 2012-12-05 16:15:38 -0500, Tom Lane wrote: >> That's fine, but the immediate question is what are we doing to fix >> the back branches. I think everyone is clear that we should be testing >> LocalHotStandbyActive rather than precursor conditions to see if a pause >> is

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Tom Lane
Robert Haas writes: > On Wed, Dec 5, 2012 at 4:15 PM, Tom Lane wrote: >> The argument for this is that although we might fetch a slightly stale >> value of the shared variable, it can't be very stale --- certainly no >> older than the spinlock acquisition near the bottom of the previous >> iterat

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Robert Haas
On Wed, Dec 5, 2012 at 4:15 PM, Tom Lane wrote: > The argument for this is that although we might fetch a slightly stale > value of the shared variable, it can't be very stale --- certainly no > older than the spinlock acquisition near the bottom of the previous > iteration of the loop. And this

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Andres Freund
On 2012-12-05 16:15:38 -0500, Tom Lane wrote: > Simon Riggs writes: > > On 5 December 2012 18:48, Tom Lane wrote: > >> On further thought, it seems like recovery_pause_at_target is rather > >> misdesigned anyway, and taking recovery target parameters from > >> recovery.conf is an obsolete API tha

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Tom Lane
Simon Riggs writes: > Yep, thats fine. > Are you doing this or do you want me to? Don't mind either way. I've got a patch for most of it already, so happy to do it. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to yo

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Simon Riggs
On 5 December 2012 21:15, Tom Lane wrote: > Simon Riggs writes: >> On 5 December 2012 18:48, Tom Lane wrote: >>> On further thought, it seems like recovery_pause_at_target is rather >>> misdesigned anyway, and taking recovery target parameters from >>> recovery.conf is an obsolete API that was d

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Tom Lane
Simon Riggs writes: > On 5 December 2012 18:48, Tom Lane wrote: >> On further thought, it seems like recovery_pause_at_target is rather >> misdesigned anyway, and taking recovery target parameters from >> recovery.conf is an obsolete API that was designed in a world before hot >> standby. What I

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Simon Riggs
On 5 December 2012 18:48, Tom Lane wrote: > I wrote: >> Andres Freund writes: >>> On 2012-12-05 17:24:42 +, Simon Riggs wrote: So ISTM that we should make recoveryStopsHere() return false while we are inconsistent. Problems solved. > >>> I prefer the previous (fixed) behaviour where

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Tom Lane
Jeff Janes writes: > Right now if I'm doing a PITR and want to look around before blessing > the restore, I have to: > [ do painful stuff ] Yeah. The worst thing about this is the cost of stepping too far forward, but I doubt we can do much about that --- WAL isn't reversible and I can't see us

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Andres Freund
On 2012-12-05 13:48:53 -0500, Tom Lane wrote: > I wrote: > > Andres Freund writes: > >> On 2012-12-05 17:24:42 +, Simon Riggs wrote: > >>> So ISTM that we should make recoveryStopsHere() return false while we > >>> are inconsistent. Problems solved. > > >> I prefer the previous (fixed) behavio

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Jeff Janes
On Wed, Dec 5, 2012 at 8:40 AM, Tom Lane wrote: > The real question here probably needs to be "what is the point of > recoveryPauseAtTarget in the first place?". I find it hard to envision > what's the point of pausing unless the user has an opportunity to > make a decision about whether to cont

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Tom Lane
I wrote: > Andres Freund writes: >> On 2012-12-05 17:24:42 +, Simon Riggs wrote: >>> So ISTM that we should make recoveryStopsHere() return false while we >>> are inconsistent. Problems solved. >> I prefer the previous (fixed) behaviour where we error out if we reach a >> recovery target befo

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Tom Lane
Andres Freund writes: > On 2012-12-05 17:24:42 +, Simon Riggs wrote: >> So ISTM that we should make recoveryStopsHere() return false while we >> are inconsistent. Problems solved. > I prefer the previous (fixed) behaviour where we error out if we reach a > recovery target before we are consis

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Andres Freund
On 2012-12-05 17:24:42 +, Simon Riggs wrote: > On 5 December 2012 17:17, Simon Riggs wrote: > > > The recovery target and the consistency point are in some ways in > > conflict. If the recovery target is before the consistency point there > > is no point in stopping there, whether or not we pa

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Simon Riggs
On 5 December 2012 17:17, Simon Riggs wrote: > The recovery target and the consistency point are in some ways in > conflict. If the recovery target is before the consistency point there > is no point in stopping there, whether or not we pause. What we should > do is say "recovery target reached,

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Simon Riggs
On 5 December 2012 16:40, Tom Lane wrote: > The real question here probably needs to be "what is the point of > recoveryPauseAtTarget in the first place?". I find it hard to envision > what's the point of pausing unless the user has an opportunity to > make a decision about whether to continue a

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Andres Freund
On 2012-12-05 18:08:01 +0100, Andres Freund wrote: > On 2012-12-05 11:40:16 -0500, Tom Lane wrote: > > Andres Freund writes: > > > Basically the whole logical arround recoveryApply seems to be broken > > > currently. Because if recoveryApply=false we currently don't pause at > > > all because we j

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Andres Freund
On 2012-12-05 11:40:16 -0500, Tom Lane wrote: > Andres Freund writes: > > Basically the whole logical arround recoveryApply seems to be broken > > currently. Because if recoveryApply=false we currently don't pause at > > all because we jump out of the apply loop with the break. > > Huh? That brea

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Tom Lane
Andres Freund writes: > Basically the whole logical arround recoveryApply seems to be broken > currently. Because if recoveryApply=false we currently don't pause at > all because we jump out of the apply loop with the break. Huh? That break is after the pause: /*

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Andres Freund
On 2012-12-05 11:11:23 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2012-12-05 13:34:05 +, Simon Riggs wrote: > >> @@ -5883,6 +5889,17 @@ StartupXLOG(void) > >> } while (record != NULL && recoveryContinue); > >> > >> /* > >> + * We've reached stop point, but not yet

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Tom Lane
Andres Freund writes: > On 2012-12-05 13:34:05 +, Simon Riggs wrote: >> @@ -5883,6 +5889,17 @@ StartupXLOG(void) >> } while (record != NULL && recoveryContinue); >> >> /* >> + * We've reached stop point, but not yet applied last >> + * record. Pause AFT

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Simon Riggs
On 5 December 2012 14:33, Andres Freund wrote: > Independent of this patch, I am slightly confused about the whole stop > logic. Isn't the idea that you can stop/start/stop/start/... recovery? > Because if !recoveryApply we break out of the whole recovery loop and > are done with things. You can

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Andres Freund
On 2012-12-05 14:33:36 +, Simon Riggs wrote: > On 5 December 2012 13:34, Simon Riggs wrote: > > > Aboriginal bug extends back to 9.0. > > I don't see any bug in 9.0 and 9.1, just 9.2+ Well the pausing logic is clearly broken in 9.1 as well, isn't it? I.e. you will get: LOG: recovery has paus

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Simon Riggs
On 5 December 2012 13:34, Simon Riggs wrote: > Aboriginal bug extends back to 9.0. I don't see any bug in 9.0 and 9.1, just 9.2+ -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-bugs mailing list (pgsq

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Andres Freund
On 2012-12-05 13:34:05 +, Simon Riggs wrote: > On 5 December 2012 02:27, Tom Lane wrote: > > Andres Freund writes: > >>> But the key is, the database was not actually consistent at that > >>> point, and so opening hot standby was a dangerous thing to do. > >>> > >>> The bug that allowed the d

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Tom Lane
Andres Freund writes: > On 2012-12-05 19:06:55 +0900, Tatsuo Ishii wrote: >> So what status are we on? Are we going to release 9.2.2 as it is? >> Or withdraw current 9.2.2? > Releasing as-is sounds good. As Tom wrote upthread: > On 2012-12-04 21:27:34 -0500, Tom Lane wrote: >> This is not a regr

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Simon Riggs
On 5 December 2012 02:27, Tom Lane wrote: > Andres Freund writes: >>> But the key is, the database was not actually consistent at that >>> point, and so opening hot standby was a dangerous thing to do. >>> >>> The bug that allowed the database to open early (the original topic if >>> this email c

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Andres Freund
On 2012-12-05 19:06:55 +0900, Tatsuo Ishii wrote: > So what status are we on? Are we going to release 9.2.2 as it is? > Or withdraw current 9.2.2? Releasing as-is sounds good. As Tom wrote upthread: On 2012-12-04 21:27:34 -0500, Tom Lane wrote: > This is not a regression because the pause logic i

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Tatsuo Ishii
So what status are we on? Are we going to release 9.2.2 as it is? Or withdraw current 9.2.2? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp > Andres Freund writes: >> On 2012-12-04 21:27:34 -0500, Tom Lane wrote: >>> So the upsh

Re: [BUGS] PITR potentially broken in 9.2

2012-12-05 Thread Simon Riggs
On 5 December 2012 00:35, Tom Lane wrote: > I wrote: >> So apparently this is something we broke since Nov 18. Don't know what >> yet --- any thoughts? > > Further experimentation shows that reverting commit > ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work. So there's > something wrong/i

Re: [BUGS] PITR potentially broken in 9.2

2012-12-04 Thread Tom Lane
Andres Freund writes: > On 2012-12-04 21:27:34 -0500, Tom Lane wrote: >> So the upshot is that I propose a patch more like the attached. > Without having run anything so far it looks good to me. BTW, while on the theme of the pause feature being several bricks shy of a load, it looks to me like

Re: [BUGS] PITR potentially broken in 9.2

2012-12-04 Thread Andres Freund
On 2012-12-04 21:27:34 -0500, Tom Lane wrote: > Andres Freund writes: > >> But the key is, the database was not actually consistent at that > >> point, and so opening hot standby was a dangerous thing to do. > >> > >> The bug that allowed the database to open early (the original topic if > >> this

Re: [BUGS] PITR potentially broken in 9.2

2012-12-04 Thread Tom Lane
Andres Freund writes: >> But the key is, the database was not actually consistent at that >> point, and so opening hot standby was a dangerous thing to do. >> >> The bug that allowed the database to open early (the original topic if >> this email chain) was masking this secondary issue. > Could

Re: [BUGS] PITR potentially broken in 9.2

2012-12-04 Thread Andres Freund
On 2012-12-04 18:05:15 -0800, Jeff Janes wrote: > On Tue, Dec 4, 2012 at 4:20 PM, Tom Lane wrote: > > Jeff Janes writes: > >> I've reproduced it again using the just-tagged 9.2.2, and uploaded a > >> 135MB tarball of the /tmp/data_slave2 and /tmp/archivedir to google > >> drive. The data directo

Re: [BUGS] PITR potentially broken in 9.2

2012-12-04 Thread Jeff Janes
On Tue, Dec 4, 2012 at 4:35 PM, Tom Lane wrote: > I wrote: >> So apparently this is something we broke since Nov 18. Don't know what >> yet --- any thoughts? > > Further experimentation shows that reverting commit > ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work. So there's > something w

Re: [BUGS] PITR potentially broken in 9.2

2012-12-04 Thread Andres Freund
On 2012-12-04 19:20:44 -0500, Tom Lane wrote: > Jeff Janes writes: > > I've reproduced it again using the just-tagged 9.2.2, and uploaded a > > 135MB tarball of the /tmp/data_slave2 and /tmp/archivedir to google > > drive. The data directory contains the recovery.conf which is set to > > end reco

Re: [BUGS] PITR potentially broken in 9.2

2012-12-04 Thread Jeff Janes
On Tue, Dec 4, 2012 at 4:20 PM, Tom Lane wrote: > Jeff Janes writes: >> I've reproduced it again using the just-tagged 9.2.2, and uploaded a >> 135MB tarball of the /tmp/data_slave2 and /tmp/archivedir to google >> drive. The data directory contains the recovery.conf which is set to >> end recov

Re: [BUGS] PITR potentially broken in 9.2

2012-12-04 Thread Andres Freund
On 2012-12-04 19:35:48 -0500, Tom Lane wrote: > I wrote: > > So apparently this is something we broke since Nov 18. Don't know what > > yet --- any thoughts? > > Further experimentation shows that reverting commit > ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work. So there's > something wr

Re: [BUGS] PITR potentially broken in 9.2

2012-12-04 Thread Tom Lane
I wrote: > So apparently this is something we broke since Nov 18. Don't know what > yet --- any thoughts? Further experimentation shows that reverting commit ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work. So there's something wrong/incomplete about that fix. This is a bit urgent since

Re: [BUGS] PITR potentially broken in 9.2

2012-12-04 Thread Tom Lane
Jeff Janes writes: > I've reproduced it again using the just-tagged 9.2.2, and uploaded a > 135MB tarball of the /tmp/data_slave2 and /tmp/archivedir to google > drive. The data directory contains the recovery.conf which is set to > end recovery between the two critical time points. Hmmm ... I c

Re: [BUGS] PITR potentially broken in 9.2

2012-12-04 Thread Jeff Janes
On Sun, Dec 2, 2012 at 1:02 PM, Tom Lane wrote: > Jeff Janes writes: >> On Sat, Dec 1, 2012 at 1:56 PM, Tom Lane wrote: >>> I'm confused. Are you now saying that this problem only exists in >>> 9.1.x? I tested current HEAD because you indicated the problem was >>> still there. > >> No, I'm say

Re: [BUGS] PITR potentially broken in 9.2

2012-12-02 Thread Tom Lane
Jeff Janes writes: > On Sat, Dec 1, 2012 at 1:56 PM, Tom Lane wrote: >> I'm confused. Are you now saying that this problem only exists in >> 9.1.x? I tested current HEAD because you indicated the problem was >> still there. > No, I'm saying the problem exists both in 9.1.x and in hypothetical

Re: [BUGS] PITR potentially broken in 9.2

2012-12-01 Thread Jeff Janes
On Sat, Dec 1, 2012 at 1:56 PM, Tom Lane wrote: > Jeff Janes writes: >> On Sat, Dec 1, 2012 at 12:47 PM, Tom Lane wrote: >>> Jeff Janes writes: In the newly fixed 9_2_STABLE, that problem still shows up the same as it does in 9.1.6. > >>> I tried to reproduce this as per your directio

Re: [BUGS] PITR potentially broken in 9.2

2012-12-01 Thread Tom Lane
Jeff Janes writes: > On Sat, Dec 1, 2012 at 12:47 PM, Tom Lane wrote: >> Jeff Janes writes: >>> In the newly fixed 9_2_STABLE, that problem still shows up the same as >>> it does in 9.1.6. >> I tried to reproduce this as per your directions, and see no problem in >> HEAD. Recovery advances to

Re: [BUGS] PITR potentially broken in 9.2

2012-12-01 Thread Jeff Janes
On Sat, Dec 1, 2012 at 12:47 PM, Tom Lane wrote: > Jeff Janes writes: >> On Wed, Nov 28, 2012 at 7:51 AM, Tom Lane wrote: >>> Is this related at all to the problem discussed over at >>> http://archives.postgresql.org/pgsql-general/2012-11/msg00709.php >>> ? The conclusion-so-far in that thread

Re: [BUGS] PITR potentially broken in 9.2

2012-12-01 Thread Tom Lane
Jeff Janes writes: > On Wed, Nov 28, 2012 at 7:51 AM, Tom Lane wrote: >> Is this related at all to the problem discussed over at >> http://archives.postgresql.org/pgsql-general/2012-11/msg00709.php >> ? The conclusion-so-far in that thread seems to be that an error >> ought to be thrown for reco

Re: [BUGS] PITR potentially broken in 9.2

2012-11-28 Thread Jeff Janes
On Wed, Nov 28, 2012 at 7:51 AM, Tom Lane wrote: > Heikki Linnakangas writes: >> On 28.11.2012 06:27, Noah Misch wrote: >>> I observed a similar problem with 9.2. Despite a restore_command that >>> failed >>> every time, startup from a hot backup completed. At the time, I suspected a >>> mista

Re: [BUGS] PITR potentially broken in 9.2

2012-11-28 Thread Jeff Janes
On Wed, Nov 28, 2012 at 5:37 AM, Heikki Linnakangas wrote: > On 28.11.2012 15:26, Andres Freund wrote: >> > > >> Can you reproduce the issue? If so, can you give an exact guide? If not, >> do you still have the datadir et al. from above? Yes, it is reliable enough to be used for "git bisect" rm

Re: [BUGS] PITR potentially broken in 9.2

2012-11-28 Thread Tom Lane
Heikki Linnakangas writes: > On 28.11.2012 06:27, Noah Misch wrote: >> I observed a similar problem with 9.2. Despite a restore_command that failed >> every time, startup from a hot backup completed. At the time, I suspected a >> mistake on my part. > I believe this was caused by this little ty

Re: [BUGS] PITR potentially broken in 9.2

2012-11-28 Thread Andres Freund
On 2012-11-28 16:34:55 +0200, Heikki Linnakangas wrote: > On 28.11.2012 15:47, Andres Freund wrote: > >I mean the label read by read_backup_label(). Jeff's mail indicated it > >had CHECKPOINT_LOCATION at 1/188D8120 but redo started at 1/CD89E48. > > That's correct. The checkpoint was at 1/188D8120

Re: [BUGS] PITR potentially broken in 9.2

2012-11-28 Thread Heikki Linnakangas
On 28.11.2012 15:47, Andres Freund wrote: I mean the label read by read_backup_label(). Jeff's mail indicated it had CHECKPOINT_LOCATION at 1/188D8120 but redo started at 1/CD89E48. That's correct. The checkpoint was at 1/188D8120, but it's redo pointer was earlier, at 1/CD89E48, so that's whe

Re: [BUGS] PITR potentially broken in 9.2

2012-11-28 Thread Andres Freund
On 2012-11-28 15:37:38 +0200, Heikki Linnakangas wrote: > On 28.11.2012 15:26, Andres Freund wrote: > >Hm. Are you sure its actually reading your backup file? Its hard to say > >without DEBUG1 output but I would tentatively say its not reading it at > >all because the the "redo starts at ..." messa

Re: [BUGS] PITR potentially broken in 9.2

2012-11-28 Thread Heikki Linnakangas
On 28.11.2012 15:26, Andres Freund wrote: Hm. Are you sure its actually reading your backup file? Its hard to say without DEBUG1 output but I would tentatively say its not reading it at all because the the "redo starts at ..." message indicates its not using the checkpoint location from the backu

Re: [BUGS] PITR potentially broken in 9.2

2012-11-28 Thread Heikki Linnakangas
On 28.11.2012 06:27, Noah Misch wrote: On Tue, Nov 27, 2012 at 10:08:12AM -0800, Jeff Janes wrote: Doing PITR in 9.2.1, the system claims that it reached a consistent recovery state immediately after redo starts. This leads to it various mysterious failures, when it should instead throw a "reque

Re: [BUGS] PITR potentially broken in 9.2

2012-11-28 Thread Andres Freund
On 2012-11-27 10:08:12 -0800, Jeff Janes wrote: > Doing PITR in 9.2.1, the system claims that it reached a consistent > recovery state immediately after redo starts. > This leads to it various mysterious failures, when it should instead > throw a "requested recovery stop point is before consistent

Re: [BUGS] PITR potentially broken in 9.2

2012-11-27 Thread Noah Misch
On Tue, Nov 27, 2012 at 10:08:12AM -0800, Jeff Janes wrote: > Doing PITR in 9.2.1, the system claims that it reached a consistent > recovery state immediately after redo starts. > This leads to it various mysterious failures, when it should instead > throw a "requested recovery stop point is before

[BUGS] PITR potentially broken in 9.2

2012-11-27 Thread Jeff Janes
Doing PITR in 9.2.1, the system claims that it reached a consistent recovery state immediately after redo starts. This leads to it various mysterious failures, when it should instead throw a "requested recovery stop point is before consistent recovery point" error. (If you are unlucky, I think it m