On Tue, Feb 28, 2023 at 11:29 PM Nathan Bossart
wrote:
> On Sat, Feb 25, 2023 at 11:00:31AM -0800, Andres Freund wrote:
> > TBH, I think the current archive and restore module APIs aren't useful. I
> > think it was a mistake to add archive modules without having demonstrated
> > that
> > one can
Greetings,
* Nathan Bossart (nathandboss...@gmail.com) wrote:
> On Sat, Feb 25, 2023 at 11:00:31AM -0800, Andres Freund wrote:
> > TBH, I think the current archive and restore module APIs aren't useful. I
> > think it was a mistake to add archive modules without having demonstrated
> > that
> > o
On Sat, Feb 25, 2023 at 11:00:31AM -0800, Andres Freund wrote:
> TBH, I think the current archive and restore module APIs aren't useful. I
> think it was a mistake to add archive modules without having demonstrated that
> one can do something useful with them that the restore_command didn't already
Hi,
On 2023-02-19 20:06:24 +0530, Robert Haas wrote:
> On Sun, Feb 19, 2023 at 2:45 AM Andres Freund wrote:
> > To me that seems even simpler? Nothing but the archiver is supposed to
> > create
> > .done files and nothing is supposed to remove .ready files without archiver
> > having created the
On Wed, Feb 22, 2023 at 09:48:10PM +1300, Thomas Munro wrote:
> On Tue, Feb 21, 2023 at 5:50 PM Nathan Bossart
> wrote:
>> On Tue, Feb 21, 2023 at 09:03:27AM +0900, Michael Paquier wrote:
>> > Perhaps beginning a new thread with a patch and a summary would be
>> > better at this stage? Another t
On Tue, Feb 21, 2023 at 5:50 PM Nathan Bossart wrote:
> On Tue, Feb 21, 2023 at 09:03:27AM +0900, Michael Paquier wrote:
> > Perhaps beginning a new thread with a patch and a summary would be
> > better at this stage? Another thing I am wondering is if it could be
> > possible to test that rather
On Tue, Feb 21, 2023 at 09:03:27AM +0900, Michael Paquier wrote:
> On Tue, Feb 14, 2023 at 09:47:55AM -0800, Nathan Bossart wrote:
>> Here is a new version of the stopgap/back-branch fix for restore_command.
>> This is more or less a rebased version of v4 with an added stderr message
>> as Andres s
On Tue, Feb 21, 2023 at 1:03 PM Michael Paquier wrote:
> Perhaps beginning a new thread with a patch and a summary would be
> better at this stage? Another thing I am wondering is if it could be
> possible to test that rather reliably. I have been playing with a few
> scenarios like holding the
On Tue, Feb 14, 2023 at 09:47:55AM -0800, Nathan Bossart wrote:
> Here is a new version of the stopgap/back-branch fix for restore_command.
> This is more or less a rebased version of v4 with an added stderr message
> as Andres suggested upthread.
So, this thread has moved around many subjects, st
Greetings,
* Michael Paquier (mich...@paquier.xyz) wrote:
> On Sun, Feb 19, 2023 at 08:06:24PM +0530, Robert Haas wrote:
> > I mean, my idea was to basically just have one big callback:
> > ArchiverModuleMainLoopCB(). Which wouldn't return, or perhaps, would
> > only return when archiving was tota
On Sun, Feb 19, 2023 at 08:06:24PM +0530, Robert Haas wrote:
> I mean, my idea was to basically just have one big callback:
> ArchiverModuleMainLoopCB(). Which wouldn't return, or perhaps, would
> only return when archiving was totally caught up and there was nothing
> more to do right now. And the
On Sun, Feb 19, 2023 at 2:45 AM Andres Freund wrote:
> To me that seems even simpler? Nothing but the archiver is supposed to create
> .done files and nothing is supposed to remove .ready files without archiver
> having created the .done files. So the archiver process can scan
> archive_status un
Hi,
On 2023-02-18 15:51:06 +0530, Robert Haas wrote:
> On Thu, Feb 16, 2023 at 10:02 PM Andres Freund wrote:
> > But there's nothing inherent in that. We know for certain which files we're
> > going to archive. And we don't need to work one-by-one. The archiver could
> > just start multiple subpr
On Thu, Feb 16, 2023 at 10:02 PM Andres Freund wrote:
> But there's nothing inherent in that. We know for certain which files we're
> going to archive. And we don't need to work one-by-one. The archiver could
> just start multiple subprocesses at the same time.
But what if it doesn't want to star
On Thu, Feb 16, 2023 at 10:28 PM Nathan Bossart
wrote:
> > Hmm. So in this design, the archiver doesn't really do the archiving
> > any more, because the interface makes that impossible. It has to use a
> > separate background worker process for that, full stop.
> >
> > I don't think that's a good
On Thu, Feb 16, 2023 at 03:08:14PM +0530, Robert Haas wrote:
> On Thu, Feb 9, 2023 at 10:53 PM Nathan Bossart
> wrote:
>> I've been thinking about this, actually. I'm wondering if we could provide
>> a list of files to the archiving callback (configurable via a variable in
>> ArchiveModuleState)
Hi,
On 2023-02-16 15:18:57 +0530, Robert Haas wrote:
> On Fri, Feb 10, 2023 at 12:59 AM Andres Freund wrote:
> > I don't think it's that hard to imagine problems. To be reasonably fast, a
> > decent restore implementation will have to 'restore ahead'. Which also
> > provides ample things to go wr
On Fri, Feb 10, 2023 at 12:59 AM Andres Freund wrote:
> I'm somewhat concerned about that too, but perhaps from a different
> angle. First, I think we don't do our users a service by defaulting the
> in-core implementation to something that doesn't scale to even a moderately
> busy server.
+1.
>
On Thu, Feb 9, 2023 at 10:53 PM Nathan Bossart wrote:
> I've been thinking about this, actually. I'm wondering if we could provide
> a list of files to the archiving callback (configurable via a variable in
> ArchiveModuleState), and then have the callback return a list of files that
> are archiv
Here is a new version of the stopgap/back-branch fix for restore_command.
This is more or less a rebased version of v4 with an added stderr message
as Andres suggested upthread.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 20edca59834c7755bfddb070fb9db3f59dc6ff96 Mon Sep 17
Hi,
On 2023-02-09 11:12:21 -0500, Robert Haas wrote:
> On Thu, Feb 9, 2023 at 10:51 AM Tom Lane wrote:
> > I'm fairly concerned about the idea of making it common for people
> > to write their own main loop for the archiver. That means that, if
> > we have a bug fix that requires the archiver to
On Thu, Feb 09, 2023 at 11:12:21AM -0500, Robert Haas wrote:
> On Thu, Feb 9, 2023 at 10:51 AM Tom Lane wrote:
>> If we think we need primitives to let the archiver hooks get all
>> the pending files, or whatever, by all means add those. But don't
>> cede fundamental control of the archiver. The
On Thu, Feb 9, 2023 at 10:51 AM Tom Lane wrote:
> I'm fairly concerned about the idea of making it common for people
> to write their own main loop for the archiver. That means that, if
> we have a bug fix that requires the archiver to do X, we will not
> just be patching our own code but trying
Robert Haas writes:
> I think that we could certainly, as Michael suggests, have people
> provide their own background worker rather than having the archiver
> invoke the user-supplied code directly. As long as the functions that
> you need in order to get the necessary information can be called f
On Wed, Feb 8, 2023 at 7:24 PM Nathan Bossart wrote:
> On Thu, Feb 09, 2023 at 08:56:24AM +0900, Michael Paquier wrote:
> > On Wed, Feb 08, 2023 at 02:25:54PM -0800, Nathan Bossart wrote:
> >> These are all good points. Perhaps there could be a base archiver
> >> implementation that shell_archive
On Thu, Feb 09, 2023 at 08:56:24AM +0900, Michael Paquier wrote:
> On Wed, Feb 08, 2023 at 02:25:54PM -0800, Nathan Bossart wrote:
>> These are all good points. Perhaps there could be a base archiver
>> implementation that shell_archive uses (and that other modules could use if
>> desired, which m
On Wed, Feb 08, 2023 at 02:25:54PM -0800, Nathan Bossart wrote:
> These are all good points. Perhaps there could be a base archiver
> implementation that shell_archive uses (and that other modules could use if
> desired, which might be important for backward compatibility with the
> existing callb
On Wed, Feb 08, 2023 at 04:24:15PM -0500, Robert Haas wrote:
> On Wed, Feb 8, 2023 at 12:43 PM Nathan Bossart
> wrote:
>> I think this could be a good approach if we decide not to bake too much
>> into PostgreSQL itself (e.g., such as creating multiple archive workers
>> that each call out to the
On Wed, Feb 8, 2023 at 12:43 PM Nathan Bossart wrote:
> I think this could be a good approach if we decide not to bake too much
> into PostgreSQL itself (e.g., such as creating multiple archive workers
> that each call out to the module). Archive module authors would
> effectively need to write t
On Wed, Feb 08, 2023 at 10:22:24AM -0500, Robert Haas wrote:
> I felt like the archive modules work was a step forward when we did,
> because basic_archive does some things that you're not likely to get
> right if you do it on your own. And a similar approach to
> restore_command might be also be v
On Sun, Feb 5, 2023 at 7:46 PM Nathan Bossart wrote:
> Got it. I suspect we'll want to do something similar for archive modules
> eventually, too.
+1.
I felt like the archive modules work was a step forward when we did,
because basic_archive does some things that you're not likely to get
right
On Sun, Feb 05, 2023 at 04:07:50PM -0800, Andres Freund wrote:
> On 2023-02-05 15:57:47 -0800, Nathan Bossart wrote:
>> I agree that the shell overhead isn't the main performance issue,
>> but it's unclear to me how much of this should be baked into
>> PostgreSQL.
>
> I don't know fully either. Bu
Hi,
On 2023-02-05 15:57:47 -0800, Nathan Bossart wrote:
> > For the segment files, we'd likely need a parameter to indicate whether
> > the restore is random or not.
>
> Wouldn't this approach still require each module to handle restoring ahead
> of time?
Yes, to some degree at least. I was just
On Sun, Feb 05, 2023 at 03:01:57PM -0800, Andres Freund wrote:
> I think at the very least you'd want to have a separate callback for
> restoring segments than for restoring other files. But more likely a
> separate callback for each type of file to be restored.
>
> For the timeline history case a
On Sun, Feb 05, 2023 at 09:49:57AM +0900, Michael Paquier wrote:
> Yes, at this stage a revert of the refactoring with shell_restore.c is
> the best path forward.
Done that now, as of 2f6e15a.
--
Michael
signature.asc
Description: PGP signature
Hi,
On 2023-02-05 14:19:38 -0800, Nathan Bossart wrote:
> On Sun, Feb 05, 2023 at 09:49:57AM +0900, Michael Paquier wrote:
> > - Should we include archive_cleanup_command into the recovery modules
> > at all? We've discussed offloading that from the checkpointer, and it
> > makes the failure hand
On Sun, Feb 05, 2023 at 09:49:57AM +0900, Michael Paquier wrote:
> - Should we include archive_cleanup_command into the recovery modules
> at all? We've discussed offloading that from the checkpointer, and it
> makes the failure handling trickier when it comes to unexpected GUC
> configurations, f
Hi,
On 2023-02-04 10:03:54 -0800, Nathan Bossart wrote:
> On Sat, Feb 04, 2023 at 03:30:29AM -0800, Andres Freund wrote:
> > That's kind of my problem with these changes. They try to introduce new
> > abstraction layers, but don't provide real abstraction, because they're
> > very tightly bound to
On Sat, Feb 04, 2023 at 10:03:54AM -0800, Nathan Bossart wrote:
> Okay. Michael, why don't we revert the shell_restore stuff for now? Once
> the archive modules interface changes and the fix for this
> SIGTERM-during-system() problem are in, I will work through this feedback
> and give recovery m
On Sat, Feb 04, 2023 at 03:30:29AM -0800, Andres Freund wrote:
> That's kind of my problem with these changes. They try to introduce new
> abstraction layers, but don't provide real abstraction, because they're
> very tightly bound to the way the functions were called before the
> refactoring. And
Hi,
On 2023-02-02 11:06:19 +0900, Michael Paquier wrote:
> > - the error message for a failed restore command seems to have gotten
> > worse:
> > could not restore file \"%s\" from archive: %s"
> > ->
> > "%s \"%s\": %s", commandName, command
>
> IMO, we don't lose any context with this m
Hi,
On 2023-02-03 10:54:17 -0800, Nathan Bossart wrote:
> @@ -146,7 +146,25 @@ ExecuteRecoveryCommand(const char *command, const char
> *commandName,
>*/
> fflush(NULL);
> pgstat_report_wait_start(wait_event_info);
> +
> + /*
> + * PreRestoreCommand() is used to tell
On Fri, Feb 03, 2023 at 10:54:17AM -0800, Nathan Bossart wrote:
> 0001 is just v1-0001 from upthread. This moves Pre/PostRestoreCommand to
> surround only the call to system(). I think this should get us closer to
> pre-v15 behavior.
+ if (exitOnSigterm)
+ PreRestoreCommand();
+
rc =
On Thu, Feb 02, 2023 at 11:44:22PM -0800, Andres Freund wrote:
> On 2023-02-03 02:24:03 -0500, Tom Lane wrote:
>> Andres Freund writes:
>> > A workaround for the back branches could be to have a test in
>> > StartupProcShutdownHandler() that tests if MyProcPid == getpid(), and
>> > not do the proc
Hi,
On February 3, 2023 9:19:23 AM GMT+01:00, Thomas Munro
wrote:
>On Fri, Feb 3, 2023 at 9:09 PM Andres Freund wrote:
>> Thinking about popen() suggests that we have a similar problem with COPY
>> FROM PROGRAM as we have in pgarch (i.e. not as bad as the startup
>> process issue, but still no
On Fri, Feb 3, 2023 at 9:09 PM Andres Freund wrote:
> Thinking about popen() suggests that we have a similar problem with COPY
> FROM PROGRAM as we have in pgarch (i.e. not as bad as the startup
> process issue, but still not great, due to
> procsignal_sigusr1_handler()).
A small mercy: while we
Hi,
On 2023-02-03 02:50:38 -0500, Tom Lane wrote:
> Andres Freund writes:
> > On 2023-02-03 02:24:03 -0500, Tom Lane wrote:
> >> setsid(2) is required since SUSv2, so I'm not sure which systems
> >> are of concern here ... other than Redmond's of course.
>
> > I was thinking of windows, yes.
>
>
Hi,
On 2023-02-03 20:34:36 +1300, Thomas Munro wrote:
> What if we block signals, fork, then in the child, install the default
> SIGTERM handler, then unblock, and then exec the shell?
Yep. I was momentarily wondering why we'd even need to unblock signals,
but while exec (et al) reset the signal
Andres Freund writes:
> On 2023-02-03 02:24:03 -0500, Tom Lane wrote:
>> setsid(2) is required since SUSv2, so I'm not sure which systems
>> are of concern here ... other than Redmond's of course.
> I was thinking of windows, yes.
But given the lack of fork(2), Windows requires a completely
diff
Thomas Munro writes:
> On Fri, Feb 3, 2023 at 8:35 PM Andres Freund wrote:
>> we don't have a hard dependency on setsid()
> FTR There are no Unixes without setsid()...
Yeah. What I just got done reading in SUSv2 (1997) is
"Derived from the POSIX.1-1988 standard". We need not
concern ourselves
Hi,
On 2023-02-03 02:24:03 -0500, Tom Lane wrote:
> Andres Freund writes:
> > Ugh, I think I might understand what's happening:
>
> > The signal arrives just after the fork() (within system()). Because we
> > have all our processes configure themselves as process group leaders,
> > and we signal
On Fri, Feb 3, 2023 at 8:35 PM Andres Freund wrote:
> we don't have a
> hard dependency on setsid()
FTR There are no Unixes without setsid()... HAVE_SETSID was only left
in the tree because we were discussing whether to replace it with
!defined(WIN32) or whether that somehow made things more con
Hi,
On 2023-02-02 14:39:19 -0800, Nathan Bossart wrote:
> Actually, this still doesn't really explain why we need to exit immediately
> in the SIGTERM handler for restore_command. We already have handling for
> when the command indicates it exited due to SIGTERM, so it should be no
> problem if t
On Fri, Feb 3, 2023 at 8:24 PM Tom Lane wrote:
> Andres Freund writes:
> > Ugh, I think I might understand what's happening:
>
> > The signal arrives just after the fork() (within system()). Because we
> > have all our processes configure themselves as process group leaders,
> > and we signal the
Andres Freund writes:
> Ugh, I think I might understand what's happening:
> The signal arrives just after the fork() (within system()). Because we
> have all our processes configure themselves as process group leaders,
> and we signal the entire process group (c.f. signal_child()), both the
> chi
Hi,
On 2023-02-02 10:23:21 +0900, Michael Paquier wrote:
> On Wed, Feb 01, 2023 at 10:18:27AM -0800, Andres Freund wrote:
> > On 2023-02-01 12:27:19 -0500, Tom Lane wrote:
> >> Andres Freund writes:
> >> The main thing that system() brings to the table is platform-specific
> >> knowledge of where
On Thu, Feb 02, 2023 at 02:39:19PM -0800, Nathan Bossart wrote:
> Maybe we could just
> remove this exit-in-SIGTERM-handler business...
I've spent some time testing this. It seems to work pretty well, but only
if I keep the exit-on-SIGTERM logic in shell_restore(). Without that, I'm
seeing delay
On Thu, Feb 02, 2023 at 02:01:13PM -0800, Nathan Bossart wrote:
> I've been digging into the history here. This e-mail seems to have the
> most context [0]. IIUC this was intended to prevent "fast" shutdowns from
> escalating to "immediate" shutdowns because the restore command died
> unexpectedl
On Thu, Feb 02, 2023 at 04:14:54PM -0500, Robert Haas wrote:
> + /*
> +* When exitOnSigterm is set and we are in the startup process, use
> the
> +* special wrapper for system() that enables exiting immediately upon
> +* receiving SIGTERM. This ensures we can break o
On Thu, Feb 2, 2023 at 3:10 PM Nathan Bossart wrote:
> Hm. I don't know if we want to encourage further use of
> in_restore_command since it seems to be prone to misuse. Here's a v2 that
> demonstrateѕ Tom's idea (bikeshedding on names and comments is welcome). I
> personally like this approach
On Thu, Feb 02, 2023 at 01:24:15PM +0900, Michael Paquier wrote:
> On Wed, Feb 01, 2023 at 09:34:44PM -0500, Tom Lane wrote:
>> I was vaguely wondering about removing both of those functions
>> in favor of an integrated function that does a system() call
>> with those things before and after it.
>
On Wed, Feb 01, 2023 at 09:34:44PM -0500, Tom Lane wrote:
> Michael Paquier writes:
>> Hmm. Isn't that something that we should also document in startup.c
>> where both routines are defined? If we begin to use
>> PreRestoreCommand() and PostRestoreCommand() in more code paths in the
>> future, t
Michael Paquier writes:
> Hmm. Isn't that something that we should also document in startup.c
> where both routines are defined? If we begin to use
> PreRestoreCommand() and PostRestoreCommand() in more code paths in the
> future, that could be again an issue.
I was vaguely wondering about remo
On Wed, Feb 01, 2023 at 02:35:55PM -0800, Nathan Bossart wrote:
> Here is a first draft for the proposed stopgap fix. If we want to proceed
> with this, I can provide patches for the back branches.
> + /*
> + * PreRestoreCommand() is used to tell the SIGTERM handler for the
> startup
>
On Wed, Feb 01, 2023 at 10:18:27AM -0800, Andres Freund wrote:
> On 2023-02-01 12:27:19 -0500, Tom Lane wrote:
>> Andres Freund writes:
>> The main thing that system() brings to the table is platform-specific
>> knowledge of where the shell is. I'm not very sure that we want to
>> wire in "/bin/s
On Wed, Feb 01, 2023 at 09:58:06AM -0800, Nathan Bossart wrote:
> On Wed, Feb 01, 2023 at 08:58:01AM -0800, Andres Freund wrote:
>> On 2023-02-01 10:12:26 -0500, Tom Lane wrote:
>>> The fundamental issue is that we have no good way to break out
>>> of system(), and I think the original idea was tha
Hi,
On 2023-02-01 12:27:19 -0500, Tom Lane wrote:
> Andres Freund writes:
> > On 2023-02-01 12:08:24 -0500, Robert Haas wrote:
> >> I like the idea of not relying on system(). In most respects, doing
> >> fork() + exec() ourselves seems superior. We can control where the
> >> output goes, what we
On Wed, Feb 01, 2023 at 08:58:01AM -0800, Andres Freund wrote:
> On 2023-02-01 10:12:26 -0500, Tom Lane wrote:
>> The fundamental issue is that we have no good way to break out
>> of system(), and I think the original idea was that
>> in_restore_command would be set *only* for the duration of the
>
Andres Freund writes:
> On 2023-02-01 12:08:24 -0500, Robert Haas wrote:
>> I like the idea of not relying on system(). In most respects, doing
>> fork() + exec() ourselves seems superior. We can control where the
>> output goes, what we do while waiting, etc. But system() runs the
>> command thro
Hi,
On 2023-02-01 12:08:24 -0500, Robert Haas wrote:
> On Wed, Feb 1, 2023 at 11:58 AM Andres Freund wrote:
> > > 9a740f81e clearly made things a lot worse, but it wasn't great
> > > before. Can we see a way forward to removing the problem entirely?
> >
> > Yea, I think we can - we should stop r
On Wed, Feb 1, 2023 at 11:58 AM Andres Freund wrote:
> > 9a740f81e clearly made things a lot worse, but it wasn't great
> > before. Can we see a way forward to removing the problem entirely?
>
> Yea, I think we can - we should stop relying on system(). If we instead
> run the command properly as
Hi,
On 2023-02-01 10:12:26 -0500, Tom Lane wrote:
> Andres Freund writes:
> > On 2023-02-01 16:21:16 +1300, Thomas Munro wrote:
> >> It's always in proc_exit() in StartupProcShutdownHandler(), a SIGTERM
> >> handler which is allowed to call that while in_restore_command is
> >> true.
>
> > Ugh, n
On Wed, Feb 01, 2023 at 10:12:26AM -0500, Tom Lane wrote:
> Andres Freund writes:
>> On 2023-02-01 16:21:16 +1300, Thomas Munro wrote:
>>> It's always in proc_exit() in StartupProcShutdownHandler(), a SIGTERM
>>> handler which is allowed to call that while in_restore_command is
>>> true.
>
>> Ugh
Andres Freund writes:
> On 2023-02-01 16:21:16 +1300, Thomas Munro wrote:
>> It's always in proc_exit() in StartupProcShutdownHandler(), a SIGTERM
>> handler which is allowed to call that while in_restore_command is
>> true.
> Ugh, no wonder we're getting crashes. This whole business seems bogus
Hi,
On 2023-02-01 16:21:16 +1300, Thomas Munro wrote:
> My database off assertion failures has four like that, all 15 and master:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2023-02-01%2001:05:17
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=202
My database off assertion failures has four like that, all 15 and master:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2023-02-01%2001:05:17
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2023-01-11%2011:16:40
https://buildfarm.postgresql.org/cgi-bin/sho
Hi,
On 2023-02-01 10:53:17 +0900, Michael Paquier wrote:
> While browsing the buildfarm, I have noticed this failure on curcilio:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2023-02-01%2001%3A05%3A17
>
> The test that has reported a failure is the check on the archive
>
77 matches
Mail list logo