Re: parallelizing the archiver

2021-11-01 Thread Bossart, Nathan
On 11/1/21, 10:57 AM, "Stephen Frost" wrote: > Definitely interested and plan to look at this more shortly, and > generally this all sounds good, but maybe we should have it be posted > under a new thread as it's moved pretty far from the subject and folks > might not appreciate what this is about

Re: parallelizing the archiver

2021-11-01 Thread Stephen Frost
Greetings, * Bossart, Nathan (bossa...@amazon.com) wrote: > On 10/25/21, 1:41 PM, "Bossart, Nathan" wrote: > > Great. Unless I see additional feedback on the basic design shortly, > > I'll give the documentation updates a try. > > Okay, here is a more complete patch with a first attempt at the

Re: parallelizing the archiver

2021-10-25 Thread Bossart, Nathan
On 10/25/21, 1:29 PM, "Robert Haas" wrote: > On Mon, Oct 25, 2021 at 3:45 PM Bossart, Nathan wrote: >> Alright, here is an attempt at that. With this revision, archive >> libraries are preloaded (and _PG_init() is called), and the archiver >> is responsible for calling _PG_archive_module_init()

Re: parallelizing the archiver

2021-10-25 Thread Robert Haas
On Mon, Oct 25, 2021 at 3:45 PM Bossart, Nathan wrote: > Alright, here is an attempt at that. With this revision, archive > libraries are preloaded (and _PG_init() is called), and the archiver > is responsible for calling _PG_archive_module_init() to get the > callbacks. I've also removed the GU

Re: parallelizing the archiver

2021-10-25 Thread Bossart, Nathan
On 10/25/21, 10:18 AM, "Robert Haas" wrote: > On Mon, Oct 25, 2021 at 1:14 PM Bossart, Nathan wrote: >> IIUC this would mean that archive modules that need to define GUCs or >> register background workers would have to separately define a >> _PG_init() and be loaded via shared_preload_libraries i

Re: parallelizing the archiver

2021-10-25 Thread Robert Haas
On Mon, Oct 25, 2021 at 1:14 PM Bossart, Nathan wrote: > IIUC this would mean that archive modules that need to define GUCs or > register background workers would have to separately define a > _PG_init() and be loaded via shared_preload_libraries in addition to > archive_library. That doesn't see

Re: parallelizing the archiver

2021-10-25 Thread Bossart, Nathan
On 10/25/21, 10:02 AM, "Robert Haas" wrote: > I don't see why this patch should need to make any changes to > internal_load_library(), PostmasterMain(), SubPostmasterMain(), or any > other central point of control, and I don't think it should. > pgarch_archiveXlog() can just load the library at th

Re: parallelizing the archiver

2021-10-25 Thread Robert Haas
On Sun, Oct 24, 2021 at 2:15 AM Bossart, Nathan wrote: > Here is an attempt at doing this. Archive modules are expected to > declare _PG_archive_module_init(), which can define GUCs, register > background workers, etc. This function must at least define the > archive callbacks. For now, I've in

Re: parallelizing the archiver

2021-10-25 Thread Stephen Frost
Greetings, * Magnus Hagander (mag...@hagander.net) wrote: > On Thu, Oct 21, 2021 at 11:05 PM Robert Haas wrote: > > On Thu, Oct 21, 2021 at 4:29 PM Stephen Frost wrote: > > > restore_command used to be in recovery.conf, which disappeared with v12 > > > and it now has to go into postgresql.auto.c

Re: parallelizing the archiver

2021-10-22 Thread Robert Haas
On Fri, Oct 22, 2021 at 1:42 PM Bossart, Nathan wrote: > Another approach would be to add a new initialization function (e.g., > PG_archive_init()) that would be used if the library is being loaded > from archive_library. Like before, you can use the library for both > shared_preload_libraries an

Re: parallelizing the archiver

2021-10-22 Thread Bossart, Nathan
On 10/22/21, 7:43 AM, "Magnus Hagander" wrote: > I still like the idea of loading the library via a special > parameter, archive_library or such. My first attempt [0] added a GUC like this, so I can speak to some of the interesting design decisions that follow. The simplest thing we could do wou

Re: parallelizing the archiver

2021-10-22 Thread Magnus Hagander
On Thu, Oct 21, 2021 at 9:51 PM Bossart, Nathan wrote: > On 10/20/21, 3:23 PM, "Bossart, Nathan" wrote: > > Alright, I reworked the patch a bit to maintain backward > > compatibility. My initial intent for 0001 was to just do a clean > > refactor to move the shell archiving stuff to its own fil

Re: parallelizing the archiver

2021-10-22 Thread Magnus Hagander
On Thu, Oct 21, 2021 at 11:05 PM Robert Haas wrote: > On Thu, Oct 21, 2021 at 4:29 PM Stephen Frost wrote: > > restore_command used to be in recovery.conf, which disappeared with v12 > > and it now has to go into postgresql.auto.conf or postgresql.conf. > > > > That's a huge breaking change. > >

Re: parallelizing the archiver

2021-10-21 Thread Robert Haas
On Thu, Oct 21, 2021 at 4:29 PM Stephen Frost wrote: > restore_command used to be in recovery.conf, which disappeared with v12 > and it now has to go into postgresql.auto.conf or postgresql.conf. > > That's a huge breaking change. Not in the same sense. Moving the functionality to a different con

Re: parallelizing the archiver

2021-10-21 Thread Stephen Frost
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Tue, Oct 19, 2021 at 2:50 PM Stephen Frost wrote: > > I keep seeing this thrown around and I don't quite get why we feel this > > is the case. I'm not completely against trying to maintain backwards > > compatibility, but at the same t

Re: parallelizing the archiver

2021-10-21 Thread Robert Haas
On Tue, Oct 19, 2021 at 2:50 PM Stephen Frost wrote: > I keep seeing this thrown around and I don't quite get why we feel this > is the case. I'm not completely against trying to maintain backwards > compatibility, but at the same time, we just went through changing quite > a bit around in v12 wi

Re: parallelizing the archiver

2021-10-19 Thread Stephen Frost
Greetings, * Magnus Hagander (mag...@hagander.net) wrote: > Backwards compatibility is definitely a must, I'd say. Regardless of > exactly how the backwards-compatible pugin is shipped, it should be what's > turned on by default. I keep seeing this thrown around and I don't quite get why we feel

Re: parallelizing the archiver

2021-10-19 Thread Robert Haas
On Tue, Oct 19, 2021 at 10:19 AM Magnus Hagander wrote: > But, is logical decoding really that great an example? I mean, we build > pgoutput.so as a library, we don't provide it compiled-in. So we could build > the "shell archiver" based on that pattern, in which case we should create a > postm

Re: parallelizing the archiver

2021-10-19 Thread Bossart, Nathan
On 10/19/21, 6:39 AM, "David Steele" wrote: > On 10/19/21 8:50 AM, Robert Haas wrote: >> I am not quite sure why we wouldn't just compile the functions into >> the server. Functions pointers can point to core functions as surely >> as loadable modules. The present design isn't too congenial to tha

Re: parallelizing the archiver

2021-10-19 Thread Magnus Hagander
On Tue, Oct 19, 2021 at 2:50 PM Robert Haas wrote: > On Mon, Oct 18, 2021 at 7:25 PM Bossart, Nathan > wrote: > > I think the biggest question is where to put the archive_command > > module, which I've called shell_archive. The only existing directory > > that looked to me like it might work is

Re: parallelizing the archiver

2021-10-19 Thread David Steele
On 10/19/21 8:50 AM, Robert Haas wrote: On Mon, Oct 18, 2021 at 7:25 PM Bossart, Nathan wrote: I think the biggest question is where to put the archive_command module, which I've called shell_archive. The only existing directory that looked to me like it might work is src/test/modules. It mig

Re: parallelizing the archiver

2021-10-19 Thread Robert Haas
On Mon, Oct 18, 2021 at 7:25 PM Bossart, Nathan wrote: > I think the biggest question is where to put the archive_command > module, which I've called shell_archive. The only existing directory > that looked to me like it might work is src/test/modules. It might be > rather bold to relegate this

Re: parallelizing the archiver

2021-10-06 Thread Magnus Hagander
On Tue, Oct 5, 2021 at 5:32 AM Bossart, Nathan wrote: > On 10/4/21, 8:19 PM, "Stephen Frost" wrote: > > It's also been discussed, at least around the water cooler (as it were > > in pandemic times- aka our internal slack channels..) that the existing > > archive command might be reimplemented as

Re: parallelizing the archiver

2021-10-04 Thread Bossart, Nathan
On 10/4/21, 8:19 PM, "Stephen Frost" wrote: > It's also been discussed, at least around the water cooler (as it were > in pandemic times- aka our internal slack channels..) that the existing > archive command might be reimplemented as an extension using these. Not > sure if that's really necessar

Re: parallelizing the archiver

2021-10-04 Thread Stephen Frost
Greetings, * Bossart, Nathan (bossa...@amazon.com) wrote: > On 10/4/21, 7:21 PM, "Stephen Frost" wrote: > > This has something we've contemplated quite a bit and the last thing > > that I'd want to have is a requirement to configure a whole bunch of > > additional parameters to enable this. Why

Re: parallelizing the archiver

2021-10-04 Thread Bossart, Nathan
On 10/4/21, 7:21 PM, "Stephen Frost" wrote: > This has something we've contemplated quite a bit and the last thing > that I'd want to have is a requirement to configure a whole bunch of > additional parameters to enable this. Why do we need to have some many > new GUCs? I would have thought we'd

Re: parallelizing the archiver

2021-10-04 Thread Stephen Frost
Greetings, * Bossart, Nathan (bossa...@amazon.com) wrote: > On 10/1/21, 12:08 PM, "Andrey Borodin" wrote: > > 30 сент. 2021 г., в 09:47, Bossart, Nathan написал(а): > >> Of course, there are drawbacks to using an extension. Besides the > >> obvious added complexity of building an extension in C

Re: parallelizing the archiver

2021-10-01 Thread Bossart, Nathan
On 10/1/21, 12:08 PM, "Andrey Borodin" wrote: > 30 сент. 2021 г., в 09:47, Bossart, Nathan написал(а): >> I tested the sample archive_command in the docs against the sample >> archive_library implementation in the patch, and I saw about a 50% >> speedup. (The archive_library actually syncs the f

Re: parallelizing the archiver

2021-10-01 Thread Andrey Borodin
> 30 сент. 2021 г., в 09:47, Bossart, Nathan написал(а): > > The attached patch is a first try at adding alternatives for > archive_command Looks like an interesting alternative design. > I tested the sample archive_command in the docs against the sample > archive_library implementation in th

Re: parallelizing the archiver

2021-10-01 Thread Bossart, Nathan
On 9/29/21, 9:49 PM, "Bossart, Nathan" wrote: > I'm sure there are other ways to approach this, but I thought I'd give > it a try to see what was possible and to get the conversation started. BTW I am also considering the background worker approach that was mentioned upthread. My current thinkin

Re: parallelizing the archiver

2021-09-14 Thread Julien Rouhaud
On Wed, Sep 15, 2021 at 4:14 AM Stephen Frost wrote: > > > > I'm not proposing to remove existing archive_command. Just deprecate it > > > one-WAL-per-call form. > > > > Which is a big API beak. > > We definitely need to stop being afraid of this. We completely changed > around how restores work

Re: parallelizing the archiver

2021-09-14 Thread Stephen Frost
Greetings, * Julien Rouhaud (rjuju...@gmail.com) wrote: > On Fri, Sep 10, 2021 at 2:03 PM Andrey Borodin wrote: > > > 10 сент. 2021 г., в 10:52, Julien Rouhaud написал(а): > > > Yes, but it also means that it's up to every single archiving tool to > > > implement a somewhat hackish parallel vers

Re: parallelizing the archiver

2021-09-10 Thread Andrey Borodin
> 10 сент. 2021 г., в 22:18, Bossart, Nathan написал(а): > > I was thinking that archive_batch_size would be the maximum batch > size. If the archiver only finds a single file to archive, that's all > it'd send to the archive command. If it finds more, it'd send up to > archive_batch_size to

Re: parallelizing the archiver

2021-09-10 Thread Robert Haas
On Fri, Sep 10, 2021 at 1:07 PM Bossart, Nathan wrote: > That being said, I think the discussion about batching is a good one > to have. If the overhead described in your SCP example is > representative of a typical archive_command, then parallelism does > seem a bit silly. I think that's pretty

Re: parallelizing the archiver

2021-09-10 Thread Bossart, Nathan
On 9/10/21, 10:12 AM, "Robert Haas" wrote: > If on the other hand you imagine a system that's not very busy, say 1 > WAL file being archived every 10 seconds, then using a batch size of > 30 would very significantly delay removal of old files. However, on > this system, batching probably isn't rea

Re: parallelizing the archiver

2021-09-10 Thread Robert Haas
On Fri, Sep 10, 2021 at 11:49 AM Julien Rouhaud wrote: > I totally agree that batching as many file as possible in a single > command is probably what's gonna achieve the best performance. But if > the archiver only gets an answer from the archive_command once it > tried to process all of the fil

Re: parallelizing the archiver

2021-09-10 Thread Jacob Champion
On Fri, 2021-09-10 at 23:48 +0800, Julien Rouhaud wrote: > I totally agree that batching as many file as possible in a single > command is probably what's gonna achieve the best performance. But if > the archiver only gets an answer from the archive_command once it > tried to process all of the fi

Re: parallelizing the archiver

2021-09-10 Thread Bossart, Nathan
On 9/10/21, 8:22 AM, "Robert Haas" wrote: > On Fri, Sep 10, 2021 at 10:19 AM Julien Rouhaud wrote: >> Those approaches don't really seems mutually exclusive? In both case >> you will need to internally track the status of each WAL file and >> handle non contiguous file sequences. In case of par

Re: parallelizing the archiver

2021-09-10 Thread Andrey Borodin
> 10 сент. 2021 г., в 19:19, Julien Rouhaud написал(а): > Wouldn't it be better to > have a new archive_mode, e.g. "daemon", and have postgres responsible > to (re)start it, and pass information through the daemon's > stdin/stdout or something like that? We don't even need to introduce new arc

Re: parallelizing the archiver

2021-09-10 Thread Julien Rouhaud
On Fri, Sep 10, 2021 at 11:22 PM Robert Haas wrote: > > Well, I guess I'm not convinced. Perhaps people with more knowledge of > this than I may already know why it's beneficial, but in my experience > commands like 'cp' and 'scp' are usually limited by the speed of I/O, > not the fact that you on

Re: parallelizing the archiver

2021-09-10 Thread Robert Haas
On Fri, Sep 10, 2021 at 10:19 AM Julien Rouhaud wrote: > Those approaches don't really seems mutually exclusive? In both case > you will need to internally track the status of each WAL file and > handle non contiguous file sequences. In case of parallel commands > you only need additional knowle

Re: parallelizing the archiver

2021-09-10 Thread Julien Rouhaud
On Fri, Sep 10, 2021 at 9:13 PM Robert Haas wrote: > > To me, it seems way more beneficial to think about being able to > invoke archive_command with many files at a time instead of just one. > I think for most plausible archive commands that would be way more > efficient than what you propose her

Re: parallelizing the archiver

2021-09-10 Thread Robert Haas
On Tue, Sep 7, 2021 at 6:36 PM Bossart, Nathan wrote: > Based on previous threads I've seen, I believe many in the community > would like to replace archive_command entirely, but what I'm proposing > here would build on the existing tools. I'm currently thinking of > something a bit like autovacu

Re: parallelizing the archiver

2021-09-09 Thread Andrey Borodin
> 10 сент. 2021 г., в 11:11, Julien Rouhaud написал(а): > > On Fri, Sep 10, 2021 at 2:03 PM Andrey Borodin wrote: >> >>> 10 сент. 2021 г., в 10:52, Julien Rouhaud написал(а): >>> >>> Yes, but it also means that it's up to every single archiving tool to >>> implement a somewhat hackish para

Re: parallelizing the archiver

2021-09-09 Thread Julien Rouhaud
On Fri, Sep 10, 2021 at 2:03 PM Andrey Borodin wrote: > > > 10 сент. 2021 г., в 10:52, Julien Rouhaud написал(а): > > > > Yes, but it also means that it's up to every single archiving tool to > > implement a somewhat hackish parallel version of an archive_command, > > hoping that core won't break

Re: parallelizing the archiver

2021-09-09 Thread Andrey Borodin
> 10 сент. 2021 г., в 10:52, Julien Rouhaud написал(а): > > On Fri, Sep 10, 2021 at 1:28 PM Andrey Borodin wrote: >> >> It's OK if external tool is responsible for concurrency. Do we want this >> complexity in core? Many users do not enable archiving at all. >> Maybe just add parallelism AP

Re: parallelizing the archiver

2021-09-09 Thread Julien Rouhaud
On Fri, Sep 10, 2021 at 1:28 PM Andrey Borodin wrote: > > It's OK if external tool is responsible for concurrency. Do we want this > complexity in core? Many users do not enable archiving at all. > Maybe just add parallelism API for external tool? > It's much easier to control concurrency in exte

Re: parallelizing the archiver

2021-09-09 Thread Andrey Borodin
> 8 сент. 2021 г., в 03:36, Bossart, Nathan написал(а): > > Anyway, I'm curious what folks think about this. I think it'd help > simplify server administration for many users. BTW this thread is also related [0]. My 2 cents. It's OK if external tool is responsible for concurrency. Do we wan

Re: parallelizing the archiver

2021-09-09 Thread Julien Rouhaud
On Fri, Sep 10, 2021 at 6:30 AM Bossart, Nathan wrote: > > Thanks for chiming in. I'm planning to work on a patch next week. Great news! About the technical concerns: > I'm currently thinking of > something a bit like autovacuum_max_workers, but the archive workers > would be created once and

Re: parallelizing the archiver

2021-09-09 Thread Bossart, Nathan
On 9/7/21, 11:38 PM, "Julien Rouhaud" wrote: > On Wed, Sep 8, 2021 at 6:36 AM Bossart, Nathan wrote: >> >> I'd like to gauge interest in parallelizing the archiver process. >> [...] >> Based on previous threads I've seen, I believe many in the com

Re: parallelizing the archiver

2021-09-07 Thread Julien Rouhaud
On Wed, Sep 8, 2021 at 6:36 AM Bossart, Nathan wrote: > > I'd like to gauge interest in parallelizing the archiver process. > [...] > Based on previous threads I've seen, I believe many in the community > would like to replace archive_command entirely, but what I'm p

parallelizing the archiver

2021-09-07 Thread Bossart, Nathan
Hi hackers, I'd like to gauge interest in parallelizing the archiver process. From a quick scan, I was only able to find one recent thread [0] that brought up this topic, and ISTM the conventional wisdom is to use a backup utility like pgBackRest that does things in parallel behind- the-s