Fwd: Re: BUG #15589: Due to missing wal, restore ends prematurely and opens database for read/write
Hi I have reported a bug via PostgreSQL bug report form, but havent got any response so far. This might not be a bug, but a feature not implemented yet. I have created an suggestion to make a small addition to StartupXLOG.c to solve the issue. Any suggestions? -- Leif Gunnar Erlandsen Videresendt melding --- Fra: l...@lako.no (mailto:l...@lako.no) Til: "Jeff Janes" mailto:jeff.ja...@gmail.com?to=%22Jeff%20Janes%22%20)>, pgsql-b...@lists.postgresql.org (mailto:pgsql-b...@lists.postgresql.org) Sendt: 12. januar 2019 kl. 08:40 Emne: Re: BUG #15589: Due to missing wal, restore ends prematurely and opens database for read/write "Jeff Janes" mailto:jeff.ja...@gmail.com?to=%22Jeff%20Janes%22%20)> skrev 11. januar 2019 kl. 15:03: On Fri, Jan 11, 2019 at 4:08 AM PG Bug reporting form mailto:nore...@postgresql.org)> wrote: PostgreSQL should have paused recovery also on the first scenario. Then I could have added missing wal and continued with restore. At the least, I think we should log an explicit WARNING if the WAL stream ends before the specified PIT is reached. The documentation for recovery.conf states that with recovery_target_time set recovery_target_action defaults to pause. Even if stop point is not reached, pause should be activated. After checking the source this might be solved with a small addition in StartupXLOG.c Someone with more experience with the source code should check this out. if (reachedStopPoint) { if (!reachedConsistency) ereport(FATAL, (errmsg("requested recovery stop point is before consistent recovery point"))); /* * This is the last point where we can restart recovery with a * new recovery target, if we shutdown and begin again. After * this, Resource Managers may choose to do permanent * corrective actions at end of recovery. */ switch (recoveryTargetAction) { case RECOVERY_TARGET_ACTION_SHUTDOWN: /* * exit with special return code to request shutdown * of postmaster. Log messages issued from * postmaster. */ proc_exit(3); case RECOVERY_TARGET_ACTION_PAUSE: SetRecoveryPause(true); recoveryPausesHere(); /* drop into promote */ case RECOVERY_TARGET_ACTION_PROMOTE: break; } /* Add these lines */ } else if (recoveryTarget == RECOVERY_TARGET_TIME) { /* * Stop point not reached but next WAL could not be read * Some explanation and warning should be logged */ switch (recoveryTargetAction) { case RECOVERY_TARGET_ACTION_PAUSE: SetRecoveryPause(true); recoveryPausesHere(); break; } } /* until here */ -- Leif
Fwd: Re: BUG #15589: Due to missing wal, restore ends prematurely and opens database for read/write
Hi I have reported a bug via PostgreSQL bug report form, but havent got any response so far. This might not be a bug, but a feature not implemented yet. I suggest to make a small addition to StartupXLOG to solve the issue. git diff src/backend/access/transam/xlog.c diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 2ab7d804f0..d0e5bb3f84 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7277,6 +7277,19 @@ StartupXLOG(void) case RECOVERY_TARGET_ACTION_PROMOTE: break; + } + } else if (recoveryTarget == RECOVERY_TARGET_TIME) + { + /* +* Stop point not reached but next WAL could not be read +* Some explanation and warning should be logged + */ + switch (recoveryTargetAction) + { + case RECOVERY_TARGET_ACTION_PAUSE: + SetRecoveryPause(true); + recoveryPausesHere(); + break; } } The scenario I want to solve is: Need to restore backup to another server. Restores pgbasebackup files Restores som wal-files Extract pgbasebackup files creates recover.conf with pit Starts postgresql recover ends before pit due to missing wal-files database opens read/write I think database should have paused recovery then I could restore additional wal-files and restart postgresql to continue with recover. With large databases and a lot of wal-files it is time consuming to repeat parts of the process. Best regards Leif Gunnar Erlandsen
Re: BUG #15589: Due to missing wal, restore ends prematurely and opens database for read/write
"Kyotaro HORIGUCHI" skrev 31. januar 2019 kl. 13:28: > If this is acceptable I'll post complete version (including > documentation). I don't think this back-patcheable. > If you are asking me, then I think this is exactly what I wanted, thank you for your effort. >> With large databases and a lot of wal-files it is time consuming to repeat >> parts of the process. > > I understand your concern. > > regards. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center regards Leif Gunnar Erlandsen
Re: BUG #15589: Due to missing wal, restore ends prematurely and opens database for read/write
"Michael Paquier" skrev 26. februar 2019 kl. 09:13: > On Thu, Jan 31, 2019 at 09:26:48PM +0900, Kyotaro HORIGUCHI wrote: > >> I don't think no one expected that server follows >> recovery_target_action without setting a target, so we can change >> the behavior when any kind of target is specified. So I propose >> to follow recovery_target_action even if not rached the target >> when any recovery target isspecified. > > Quoting the docs: > https://www.postgresql.org/docs/current/recovery-target-settings.html > recovery_target_action (enum) > "Specifies what action the server should take once the recovery target > is *reached*." I know this and recovery_target_action in my case was "pause". Recovery target was specified with a date and time. > So what we have now is that an action would be taken iff a stop point > is defined and reached. What this patch changes is that the action > would be taken even if the stop point has *not* been reached once the > end of a WAL stream is found. Yes, and this is expected behaviour in my use case. This was a PITR scenario, to a new server, and not crash recovery. I restored a backup and placed WAL-files in a separate directory, then I created a recovery.conf with correct recovery_target_time. After PostgreSQL started it stopped after a short while and opened the database in read/write. Checks showed target was not reached. Log showed that no more WAL could be found. If PostgreSQL had followed recovery_target_action, then I could have restored the missing WAL-files and continued replay of WAL. As this was not the case I had to restart the process from the beginning, this took many hours. Another thing to consider is that in instances such as this one, where a lot of WAL was needed for replay, it is not always given that we have the sufficient amount of available disk space in order to store them all at the same time. > Please do not take me wrong, I can see that there could be use cases > where it is possible to take an action at the end of a WAL stream if > there is less WAL than what was planned, perhaps if the OP has set > an incorrect stop position too far in the future, still too much WAL > would have been replayed so it would make the base backup unusable for > future uses. Also, it looks incorrect to me to change an existing > behavior and to use the same semantics for triggering an action if a > stop point is defined and reached. I did not set an incorrect stop position. I see this change as something most in a similar situation would expect from their database system. AFAIK the doc does not specify what happens if recovery_target_time is specified but not reached. But as default recovery_target_action is set to "pause" I would have assumed "pause" to be the action. regards Leif Gunnar Erlandsen
pause recovery if pitr target not reached
This patch allows PostgreSQL to pause recovery before PITR target is reached if recovery_target_time is specified. Missing WAL's could then be restored from backup and applied on next restart. Today PostgreSQL opens the database in read/write on a new timeline even when PITR tareg is not reached. make check is run with this patch with result "All 192 tests passed." Source used is from version 12b4. For both examples below "recovery_target_time = '2019-09-17 09:24:00'" _ Log from todays behavior: [20875] LOG: starting point-in-time recovery to 2019-09-17 09:24:00+02 [20875] LOG: restored log file "00010002" from archive [20875] LOG: redo starts at 0/228 [20875] LOG: consistent recovery state reached at 0/2000100 [20870] LOG: database system is ready to accept read only connections [20875] LOG: restored log file "00010003" from archive [20875] LOG: restored log file "00010004" from archive cp: cannot stat '/var/lib/pgsql/12/archivedwal/00010005': No such file or directory [20875] LOG: redo done at 0/40080C8 [20875] LOG: last completed transaction was at log time 2019-09-17 09:13:10.524645+02 [20875] LOG: restored log file "00010004" from archive cp: cannot stat '/var/lib/pgsql/12/archivedwal/0002.history': No such file or directory [20875] LOG: selected new timeline ID: 2 [20875] LOG: archive recovery complete cp: cannot stat '/var/lib/pgsql/12/archivedwal/0001.history': No such file or directory [20870] LOG: database system is ready to accept connections And with patched source: [20899] LOG: starting point-in-time recovery to 2019-09-17 09:24:00+02 [20899] LOG: restored log file "00010002" from archive [20899] LOG: redo starts at 0/228 [20899] LOG: consistent recovery state reached at 0/20002B0 [20895] LOG: database system is ready to accept read only connections [20899] LOG: restored log file "00010003" from archive [20899] LOG: restored log file "00010004" from archive cp: cannot stat '/var/lib/pgsql/12m/archivedwal/00010005': No such file or directory [20899] LOG: Recovery target not reached but next WAL record culd not be read [20899] LOG: redo done at 0/4007D40 [20899] LOG: last completed transaction was at log time 2019-09-17 09:13:10.539546+02 [20899] LOG: recovery has paused [20899] HINT: Execute pg_wal_replay_resume() to continue. You could restore WAL in several steps and when target is reached you get this log [21943] LOG: starting point-in-time recovery to 2019-09-17 09:24:00+02 [21943] LOG: restored log file "00010005" from archive [21943] LOG: redo starts at 0/5003C38 [21943] LOG: consistent recovery state reached at 0/600 [21941] LOG: database system is ready to accept read only connections [21943] LOG: restored log file "00010006" from archive [21943] LOG: recovery stopping before commit of transaction 859, time 2019-09-17 09:24:02.58576+02 [21943] LOG: recovery has paused [21943] HINT: Execute pg_wal_replay_resume() to continue. Execute pg_wal_replay_resume() as hinted. [21943] LOG: redo done at 0/6001830 [21943] LOG: last completed transaction was at log time 2019-09-17 09:23:57.496945+02 cp: cannot stat '/var/lib/pgsql/12m/archivedwal/0002.history': No such file or directory [21943] LOG: selected new timeline ID: 2 [21943] LOG: archive recovery complete cp: cannot stat '/var/lib/pgsql/12m/archivedwal/0001.history': No such file or directory [21941] LOG: database system is ready to accept connections Leif Gunnar Erlandsen 0001-pause-recovery-if-pitr-target-not-reached.patch Description: Binary data
Re: pause recovery if pitr target not reached
>"Peter Eisentraut" skrev 6. november 2019 >kl. 08:32: > >> Btw., this discussion/patch seems related: >> https://www.postgresql.org/message-id/flat/a3f650f1-fb0f-c913-a000-a4671f12a...@postgrespro.ru I have read through this other proposal. As far as I could see in the suggested patch, it does not solve the same problem. It still stops recovery when the recovery process does not find any more WAL. I would like the process to pause so administrator get to choose to find more WAL to apply. My patch should probably be extended to include RECOVERY_TARGET_XID, RECOVERY_TARGET_NAME, RECOVERY_TARGET_LSN as well as RECOVERY_TARGET_TIME. --- Leif Gunnar Erlandsen
Re: pause recovery if pitr target not reached
Adding another patch which is not only for recovery_target_time but also for xid, name and lsn. > After studying this a bit more, I think the current behavior is totally bogus > and needs a serious > rethink. > > If you specify a recovery target and it is reached, recovery pauses > (depending on > recovery_target_action). > > If you specify a recovery target and it is not reached when the end of the > archive is reached > (i.e., restore_command fails), then recovery ends and the server is promoted, > without any further > information. This is clearly wrong in multiple ways. Yes, that is why I have created the patch. > > I think what we should do is if we specify a recovery target and we don't > reach it, we should > ereport(FATAL). Somewhere around > If recovery pauses or a FATAL error is reported, is not important, as long as it is possible to get some more WAL and continue recovery. Pause has the benefit of the possibility to inspect tables in the database. > in StartupXLOG(), where we already check for other conditions that are > undesirable at the end of > recovery. Then a user can make fixes either by getting more WAL files to > restore and adjusting the > recovery target and starting again. I don't think pausing is the right > behavior, but perhaps an > argument could be made to offer it as a nondefault behavior. Pausing was choosen in the patch as pause was the expected behaivior if target was reached. And the patch does not interfere with any other functionality as far as I know. -- Leif Gunnar Erlandsen 0002-pause-recovery-if-pitr-target-not-reached.patch Description: Binary data
Re: pause recovery if pitr target not reached
"Kyotaro Horiguchi" skrev 22. november 2019 kl. 05:26: > Hello, Lief, Peter. > > At Thu, 21 Nov 2019 12:50:18 +, "Leif Gunnar Erlandsen" > wrote in > >> Adding another patch which is not only for recovery_target_time but also for >> xid, name and lsn. >> >> After studying this a bit more, I think the current behavior is totally >> bogus and needs a serious >> rethink. >> >> If you specify a recovery target and it is reached, recovery pauses >> (depending on >> recovery_target_action). >> >> If you specify a recovery target and it is not reached when the end of the >> archive is reached >> (i.e., restore_command fails), then recovery ends and the server is >> promoted, without any further >> information. This is clearly wrong in multiple ways. >> >> Yes, that is why I have created the patch. > > It seems premising to be used in prepeated trial-and-error recovery by > well experiecned operators. When it is used, I think that the target > goes back gradually through repetitions so anyway we need to start > from a clean backup for each repetition, in the expected > usage. Unintended promotion doesn't harm in the case. If going back in time and gradually recover less WAL todays behaiviour is adequate. The patch is for circumstances where for some reason you do not have all the WAL's ready at once. > > In this persipective, I don't think the behavior is totally wrong but > FATAL'ing at EO-WAL before target seems good to do. > >> I think what we should do is if we specify a recovery target and we don't >> reach it, we should >> ereport(FATAL). Somewhere around >> >> If recovery pauses or a FATAL error is reported, is not important, as long >> as it is possible to get >> some more WAL and continue recovery. Pause has the benefit of the >> possibility to inspect tables in >> the database. >> >> in StartupXLOG(), where we already check for other conditions that are >> undesirable at the end of >> recovery. Then a user can make fixes either by getting more WAL files to >> restore and adjusting the >> recovery target and starting again. I don't think pausing is the right >> behavior, but perhaps an >> argument could be made to offer it as a nondefault behavior. >> >> Pausing was choosen in the patch as pause was the expected behaivior if >> target was reached. >> >> And the patch does not interfere with any other functionality as far as I >> know. > > With the current behavior, if server promotes without stopping as told > by target_action variables, it is a sign that something's wrong. But > if server pauses before reaching target, operators may overlook the > message if they don't know of the behavior. And if server poses in the > case, I think there's nothing to do. Yes, that is correct. FATAL might be the correct behaiviour. > > So +1 for FATAL. > > regards. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center
Re: pause recovery if pitr target not reached
"Peter Eisentraut" skrev 22. november 2019 kl. 11:50: > On 2019-11-21 13:50, Leif Gunnar Erlandsen wrote: > >> Pausing was choosen in the patch as pause was the expected behaivior if >> target was reached. > > Pausing is the expect behavior when the target is reached because that is the > default setting of > recovery_target_action. Your patch does not take recovery_target_action into > account. No it does not. It works well to demonstrate its purpose though. And it might be to stop with FATAL would be more correct. > > -- Peter Eisentraut http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pause recovery if pitr target not reached
Adding patch written for 13dev from git "Michael Paquier" skrev 1. desember 2019 kl. 03:08: > On Fri, Nov 22, 2019 at 11:26:59AM +, Leif Gunnar Erlandsen wrote: > >> No it does not. It works well to demonstrate its purpose though. >> And it might be to stop with FATAL would be more correct. > > This is still under active discussion. Please note that the latest > patch does not apply, so a rebase would be nice to have. I have moved > the patch to next CF, waiting on author. > -- > Michael 0004-pause-recovery-if-pitr-target-not-reached.patch Description: Binary data
Re: pause recovery if pitr target not reached
> "Peter Eisentraut" skrev 14. januar 2020 > kl. 21:13: > > On 2019-12-11 12:40, Leif Gunnar Erlandsen wrote: >> Adding patch written for 13dev from git >> "Michael Paquier" skrev 1. desember 2019 kl. 03:08: >> On Fri, Nov 22, 2019 at 11:26:59AM +, Leif Gunnar Erlandsen wrote: > >> No it does not. It works well to demonstrate its purpose though. >> And it might be to stop with FATAL would be more correct. > > This is still under active discussion. Please note that the latest > patch does not apply, so a rebase would be nice to have. I have moved > the patch to next CF, waiting on author. > > I reworked your patch a bit. I changed the outcome to be an error, as was > discussed. I also added > tests and documentation. Please take a look. Thank you, it was not unexpexted for the patch to be a little bit smaller. Although it would have been nice to log where recover ended before reporting fatal error. And since you use RECOVERY_TARGET_UNSET, RECOVERY_TARGET_IMMEDIATE also gets included, is this correct? > -- Peter Eisentraut http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pause recovery if pitr target not reached
Great job with the patch Peter, it has been even cleaner than before after you moved the check. > "Peter Eisentraut" skrev 27. januar 2020 > kl. 12:16:
Implement hook for self-join simplification
Hi, I have a made a small deductive database on top of PostgreSQL for educational/research purposes. In this setting, due to certain VIEW-constructions, queries often end up being self-joins on primary keys, e.g.: SELECT t1.id, t2.val FROM t AS t1 JOIN t AS t2 USING (id); where t(id) is a primary key. This query is equivalent to the much more efficient: SELECT id, val FROM t AS t1; However, PostgreSQL currently does not seem to implement this simplification. Therefore, I have looked into writing an extension that performs this, but I am struggling a bit with finding out when this simplification should be done, i.e. which hook I should implement. The simplification is not too different from those done in prep/prepjoin.c, i.e. doing the simplification on the query-tree directly. However, I think I then would need to implement a planner_hook, as it is the only hook giving me direct access to the query-tree. But I need to perform my simplification after view-definitions have been expanded into the query, and after the transformations in prepjoin.c (but before the rest of planning). But there seems to be no easy way to inject a function there, as this is buried deep in the middle of the planner-function. I therefore looked into using a set_join_pathlist_hook, and try to do the simplification at path-level. I.e., doing something like: static void self_join_optimize_hook(PlannerInfo *root, RelOptInfo* joinrel, RelOptInfo* outerrel, RelOptInfo* innerrel, JoinType jointype, JoinPathExtraData* extra) { if (is_selfjoin_on_pk(root, joinrel, extra)) { ListCell *p; foreach(p, innerrel->pathlist) { add_path(joinrel, (Path *) p); } } } That is, if joinrel is a (binary) self-join on a primary key, the paths for evaluating the join is the same as the paths for evaluating the innerrel, However, this does not work, as the rest of the query may require values from the other table (e.g. t2 in the example above). I therefore need to replace all mentions of t2 with t1, but is this possible at a path-level? If not, does anyone have a an idea on how this can be done in a different way? Thanks! Kind regards, Leif Harald Karlsen Senior Lecturer Department of Informatics University of Oslo
Re: Implement hook for self-join simplification
Hi Andrey, Thank you for the quick answer, and for the pointer to the patch! This looks like just the thing I need! On a more general note: What would, in general, be the best way to implement such optimizations? Is there a good way to do this as an extension, or is a patch the preferred way? Kind regards, Leif Harald Karlsen Senior Lecturer Department of Informatics University of Oslo From: Andrey Lepikhov Sent: 24 June 2022 19:27:50 To: Leif Harald Karlsen; pgsql-hackers@lists.postgresql.org Subject: Re: Implement hook for self-join simplification On 24/6/2022 18:58, Leif Harald Karlsen wrote: > I have a made a small deductive database on top of PostgreSQL for > educational/research purposes. In this setting, due to certain > VIEW-constructions, queries often end up being self-joins on primary > keys, e.g.: > SELECT t1.id, t2.val > FROM t AS t1 JOIN t AS t2 USING (id); > > where t(id) is a primary key. This query is equivalent to the much more > efficient: > SELECT id, val FROM t AS t1; > > However, PostgreSQL currently does not seem to implement this > simplification. Therefore, I have looked into writing an extension that > performs this, but I am struggling a bit with finding out when this > simplification should be done, i.e. which hook I should implement. It is true, but you can use a proposed patch that adds such functionality [1]. I tried to reproduce your case: CREATE TABLE t(id int PRIMARY KEY, val text); explain verbose SELECT t1.id, t2.val FROM t AS t1 JOIN t AS t2 USING (id); With this patch you will get a plan: Seq Scan on public.t t2 Output: t2.id, t2.val Filter: (t2.id IS NOT NULL) The approach, implemented in this patch looks better because removes self-joins on earlier stage than the path generation stage. Feel free to use it in your research. [1] https://www.postgresql.org/message-id/a1d6290c-44e0-0dfc-3fca-66a68b310...@postgrespro.ru -- regards, Andrey Lepikhov Postgres Professional