Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-07 Thread Kyotaro HORIGUCHI
Everything seems settled up above my head while sleeping.. Sorry for crumsy test script, and thank you for refining it, Mitsumasa. And thank you for fixing the bug and the detailed explanation, Heikki. I confirmed that the problem is fixed also for me at origin/REL9_2_STABLE. > I understand thi

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-07 Thread KONDO Mitsumasa
(2013/03/07 19:41), Heikki Linnakangas wrote: On 07.03.2013 10:05, KONDO Mitsumasa wrote: (2013/03/06 16:50), Heikki Linnakangas wrote:> Yeah. That fix isn't right, though; XLogPageRead() is supposed to return true on success, and false on error, and the patch makes it return 'true' on error, i

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-07 Thread Heikki Linnakangas
On 07.03.2013 10:05, KONDO Mitsumasa wrote: (2013/03/06 16:50), Heikki Linnakangas wrote:> Yeah. That fix isn't right, though; XLogPageRead() is supposed to return true on success, and false on error, and the patch makes it return 'true' on error, if archive recovery was requested but we're stil

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-07 Thread KONDO Mitsumasa
(2013/03/06 16:50), Heikki Linnakangas wrote:> Hi, Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord(); Attempt patch records minRecoveryPoint. [crash recovery -> record minRecoveryPoint in control file -> archive recovery] I think that this is an original intention of Heik

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-06 Thread Heikki Linnakangas
On 05.03.2013 14:09, KONDO Mitsumasa wrote: Hi, Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord(); Attempt patch records minRecoveryPoint. [crash recovery -> record minRecoveryPoint in control file -> archive recovery] I think that this is an original intention of Heikki'

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-05 Thread Kyotaro HORIGUCHI
Hi, I suppose the attached patch is close to the solution. > I think that this is an original intention of Heikki's patch. I noticed that archive recovery will be turned on in next_record_is_invalid thanks to your patch. > On the other hand, your patch fixes that point but ReadRecord > runs on t

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-05 Thread Kyotaro HORIGUCHI
Hmm.. > Horiguch's patch does not seem to record minRecoveryPoint in > ReadRecord(); > Attempt patch records minRecoveryPoint. > [crash recovery -> record minRecoveryPoint in control file -> archive > recovery] > I think that this is an original intention of Heikki's patch. It could be. Before th

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-05 Thread KONDO Mitsumasa
Hi, Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord(); Attempt patch records minRecoveryPoint. [crash recovery -> record minRecoveryPoint in control file -> archive recovery] I think that this is an original intention of Heikki's patch. I also found a bug in latest 9.2_st

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-05 Thread Kyotaro HORIGUCHI
Sorry, I sent wrong script. > The head of origin/REL9_2_STABLE shows the behavior I metioned in > the last message when using the shell script attached. 9.3dev > runs as expected. regards, -- Kyotaro Horiguchi NTT Open Source Software Center #! /bin/sh pgpath="$HOME/bin/pgsql_924b" echo $PATH |

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-05 Thread Kyotaro HORIGUCHI
Hello, I could cause the behavior and might understand the cause. The head of origin/REL9_2_STABLE shows the behavior I metioned in the last message when using the shell script attached. 9.3dev runs as expected. In XLogPageRead, when RecPtr goes beyond the last page, the current xlog file is rele

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-04 Thread Kyotaro HORIGUCHI
This is an interim report for this patch. We found that PostgreSQL with this patch unexpctedly becomes primary when starting up as standby. We'll do further investigation for the behavior. > > Anyway, I've committed this to master and 9.2 now. > > This seems to fix the issue. We'll examine this

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-26 Thread Josh Berkus
Folks, Is there any way this particular issue could cause data corruption without causing a crash? I don't see a way for it to do so, but I wanted to verify. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-25 Thread Kyotaro HORIGUCHI
However this has become useless, I want to explain about how this works. > > I tried to postpone smgrtruncate TO the next checktpoint. > > Umm, why? I don't understand this patch at all. This inhibits truncate files after (quite vague in the patch:-) the previous checkpoint by hindering the dele

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-25 Thread Kyotaro HORIGUCHI
At Fri, 22 Feb 2013 11:42:39 +0200, Heikki Linnakangas wrote in <51273d8f.7060...@vmware.com> > On 15.02.2013 10:33, Kyotaro HORIGUCHI wrote: > > In HA DB cluster cosists of Pacemaker and PostgreSQL, PostgreSQL > > is stopped by 'pg_ctl stop -m i' regardless of situation. > > That seems like a b

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-25 Thread Kyotaro HORIGUCHI
Hello, > Anyway, I've committed this to master and 9.2 now. This seems to fix the issue. We'll examine this further. Thank you. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: h

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-22 Thread Heikki Linnakangas
On 15.02.2013 10:33, Kyotaro HORIGUCHI wrote: Sorry, I omitted to show how we found this issue. In HA DB cluster cosists of Pacemaker and PostgreSQL, PostgreSQL is stopped by 'pg_ctl stop -m i' regardless of situation. That seems like a bad idea. If nothing else, crash recovery can take a lon

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-22 Thread Heikki Linnakangas
On 14.02.2013 19:18, Fujii Masao wrote: Yes. And the resource agent for streaming replication in Pacemaker (it's the OSS clusterware) is the user of that archive recovery scenario, too. When it starts up the server, it always creates the recovery.conf and starts the server as the standby. It cann

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-22 Thread Heikki Linnakangas
On 22.02.2013 02:13, Michael Paquier wrote: On Thu, Feb 21, 2013 at 11:09 PM, Heikki Linnakangas< hlinnakan...@vmware.com> wrote: On 15.02.2013 15:49, Heikki Linnakangas wrote: Attached is a patch for git master. The basic idea is to split InArchiveRecovery into two variables, InArchiveRecov

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-21 Thread Michael Paquier
On Thu, Feb 21, 2013 at 11:09 PM, Heikki Linnakangas < hlinnakan...@vmware.com> wrote: > On 15.02.2013 15:49, Heikki Linnakangas wrote: > >> Attached is a patch for git master. The basic idea is to split >> InArchiveRecovery into two variables, InArchiveRecovery and >> ArchiveRecoveryRequested. Ar

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-21 Thread Heikki Linnakangas
On 15.02.2013 15:49, Heikki Linnakangas wrote: Attached is a patch for git master. The basic idea is to split InArchiveRecovery into two variables, InArchiveRecovery and ArchiveRecoveryRequested. ArchiveRecoveryRequested is set when recovery.conf exists. But if we don't know how far we need to re

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-20 Thread Heikki Linnakangas
On 20.02.2013 10:01, Kyotaro HORIGUCHI wrote: Sorry, Let me correct a bit. I tried to postpone smgrtruncate after the next checkpoint. This I tried to postpone smgrtruncate TO the next checktpoint. Umm, why? I don't understand this patch at all. - Heikki -- Sent via pgsql-hackers mailing

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-20 Thread Kyotaro HORIGUCHI
Sorry, Let me correct a bit. > I tried to postpone smgrtruncate after the next checkpoint. This I tried to postpone smgrtruncate TO the next checktpoint. > is similar to what hotstandby feedback does to vacuum. It seems > to be working fine but I warry that it might also bloats the > table. I h

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-19 Thread Kyotaro HORIGUCHI
Hello, I looked this from another point of view. I consider the current discussion to be based on how to predict the last consistency point. But there is another aspect of this issue. I tried to postpone smgrtruncate after the next checkpoint. This is similar to what hotstandby feedback does to v

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-19 Thread Ants Aasma
On Mon, Feb 18, 2013 at 8:27 PM, Heikki Linnakangas wrote: > backupStartPoint is set, which signals recovery to wait for an end-of-backup > record, until the system is considered consistent. If the backup is taken > from a hot standby, backupEndPoint is set, instead of inserting an > end-of-backup

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-18 Thread Heikki Linnakangas
On 16.02.2013 10:40, Ants Aasma wrote: On Fri, Feb 15, 2013 at 3:49 PM, Heikki Linnakangas wrote: While this solution would help solve my issue, it assumes that the correct amount of WAL files are actually there. Currently the docs for setting up a standby refer to "24.3.4. Recovering Using a

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-16 Thread Ants Aasma
On Fri, Feb 15, 2013 at 3:49 PM, Heikki Linnakangas wrote: >> While this solution would help solve my issue, it assumes that the >> correct amount of WAL files are actually there. Currently the docs for >> setting up a standby refer to "24.3.4. Recovering Using a Continuous >> Archive Backup", and

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-15 Thread Heikki Linnakangas
On 15.02.2013 13:05, Ants Aasma wrote: On Wed, Feb 13, 2013 at 10:52 PM, Simon Riggs wrote: The problem is that we startup Hot Standby before we hit the min recovery point because that isn't recorded. For me, the thing to do is to make the min recovery point == end of WAL when state is DB_IN_PR

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-15 Thread Ants Aasma
On Wed, Feb 13, 2013 at 10:52 PM, Simon Riggs wrote: > The problem is that we startup Hot Standby before we hit the min > recovery point because that isn't recorded. For me, the thing to do is > to make the min recovery point == end of WAL when state is > DB_IN_PRODUCTION. That way we don't need t

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-15 Thread Kyotaro HORIGUCHI
Sorry, I omitted to show how we found this issue. In HA DB cluster cosists of Pacemaker and PostgreSQL, PostgreSQL is stopped by 'pg_ctl stop -m i' regardless of situation. On the other hand, PosrgreSQL RA(Rsource Agent) is obliged to start the master node via hot standby state because of the res

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-14 Thread Fujii Masao
On Thu, Feb 14, 2013 at 5:52 AM, Simon Riggs wrote: > On 13 February 2013 09:04, Heikki Linnakangas wrote: > >> Without step 3, the server would perform crash recovery, and it would work. >> But because of the recovery.conf file, the server goes into archive >> recovery, and because minRecoveryPo

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-14 Thread Fujii Masao
On Thu, Feb 14, 2013 at 5:15 AM, Heikki Linnakangas wrote: > On 13.02.2013 17:02, Tom Lane wrote: >> >> Heikki Linnakangas writes: >>> >>> At least in back-branches, I'd call this a pilot error. You can't turn a >>> master into a standby just by creating a recovery.conf file. At least >>> not if

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-14 Thread Ants Aasma
On Feb 13, 2013 10:29 PM, "Heikki Linnakangas" wrote: > Hmm, I just realized a little problem with that approach. If you take a base backup using an atomic filesystem backup from a running server, and start archive recovery from that, that's essentially the same thing as Kyotaro's test case. Coin

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Simon Riggs
On 13 February 2013 09:04, Heikki Linnakangas wrote: > Without step 3, the server would perform crash recovery, and it would work. > But because of the recovery.conf file, the server goes into archive > recovery, and because minRecoveryPoint is not set, it assumes that the > system is consistent

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Heikki Linnakangas
On 13.02.2013 17:02, Tom Lane wrote: Heikki Linnakangas writes: At least in back-branches, I'd call this a pilot error. You can't turn a master into a standby just by creating a recovery.conf file. At least not if the master was not shut down cleanly first. ... I'm not sure that's worth the tro

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Tom Lane
Heikki Linnakangas writes: > On 13.02.2013 21:30, Tom Lane wrote: >> Well, archive recovery is a different scenario --- Simon was questioning >> whether we need a minRecoveryPoint mechanism in crash recovery, or at >> least that's what I thought he asked. > Ah, ok. The short answer to that is "no

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Tom Lane
Heikki Linnakangas writes: > The problem we're trying to solve is determining how much WAL needs to > be replayed until the database is consistent again. In crash recovery, > the answer is "all of it". That's why the CRC in the WAL is essential; > it's required to determine where the WAL ends.

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Heikki Linnakangas
On 13.02.2013 21:30, Tom Lane wrote: Heikki Linnakangas writes: On 13.02.2013 21:21, Tom Lane wrote: It would only be broken if someone interrupted a crash recovery mid-flight and tried to establish a recovery stop point before the end of WAL, no? Why don't we just forbid that case? This woul

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Tom Lane
Heikki Linnakangas writes: > On 13.02.2013 21:21, Tom Lane wrote: >> It would only be broken if someone interrupted a crash recovery >> mid-flight and tried to establish a recovery stop point before the end >> of WAL, no? Why don't we just forbid that case? This would either be >> the same as, or

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Heikki Linnakangas
On 13.02.2013 21:03, Tom Lane wrote: Simon Riggs writes: On 13 February 2013 09:04, Heikki Linnakangas wrote: To be precise, we'd need to update the control file on every XLogFlush(), like we do during archive recovery. That would indeed be unacceptable from a performance point of view. Updat

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Heikki Linnakangas
On 13.02.2013 21:21, Tom Lane wrote: Heikki Linnakangas writes: Well, no-one's complained about the performance. From a robustness point of view, it might be good to keep the minRecoveryPoint value in a separate file, for example, to avoid rewriting the control file that often. Then again, why

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Tom Lane
Heikki Linnakangas writes: > Well, no-one's complained about the performance. From a robustness point > of view, it might be good to keep the minRecoveryPoint value in a > separate file, for example, to avoid rewriting the control file that > often. Then again, why fix it when it's not broken.

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Heikki Linnakangas
On 13.02.2013 20:25, Simon Riggs wrote: On 13 February 2013 09:04, Heikki Linnakangas wrote: To be precise, we'd need to update the control file on every XLogFlush(), like we do during archive recovery. That would indeed be unacceptable from a performance point of view. Updating the control fi

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Tom Lane
Simon Riggs writes: > On 13 February 2013 09:04, Heikki Linnakangas wrote: >> To be precise, we'd need to update the control file on every XLogFlush(), >> like we do during archive recovery. That would indeed be unacceptable from a >> performance point of view. Updating the control file that ofte

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Simon Riggs
On 13 February 2013 09:04, Heikki Linnakangas wrote: > To be precise, we'd need to update the control file on every XLogFlush(), > like we do during archive recovery. That would indeed be unacceptable from a > performance point of view. Updating the control file that often would also > be bad for

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Tom Lane
Heikki Linnakangas writes: > At least in back-branches, I'd call this a pilot error. You can't turn a > master into a standby just by creating a recovery.conf file. At least > not if the master was not shut down cleanly first. > ... > I'm not sure that's worth the trouble, though. Perhaps it wou

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Heikki Linnakangas
On 13.02.2013 09:46, Kyotaro HORIGUCHI wrote: In this case, the FINAL consistency point is at the XLOG_SMGR_TRUNCATE record, but current implemet does not record the consistency point (checkpoint, or commit or smgr_truncate) itself, so we cannot predict the final consistency point on starting of

[HACKERS] 9.2.3 crashes during archive recovery

2013-02-12 Thread Kyotaro HORIGUCHI
Hello, 9.2.3 crashes during archive recovery. This was also corrected at some point on origin/master with another problem fixed by the commit below if my memory is correct. But current HEAD and 9.2.3 crashes during archive recovery (not on standby) by the 'marking deleted page visible' problem. h