Re: optimizing pg_upgrade's once-in-each-database steps

2024-09-16 Thread Nathan Bossart
On Mon, Sep 09, 2024 at 11:20:28PM +0200, Daniel Gustafsson wrote: >> On 9 Sep 2024, at 21:17, Nathan Bossart wrote: >> >> On Thu, Sep 05, 2024 at 01:32:34PM +0200, Daniel Gustafsson wrote: >>> I've read and tested through the latest version of this patchset and I think >>> it's ready to go in. >

Re: optimizing pg_upgrade's once-in-each-database steps

2024-09-09 Thread Daniel Gustafsson
> On 9 Sep 2024, at 21:17, Nathan Bossart wrote: > > On Thu, Sep 05, 2024 at 01:32:34PM +0200, Daniel Gustafsson wrote: >> I've read and tested through the latest version of this patchset and I think >> it's ready to go in. > > Thanks for reviewing. I'm aiming to commit it later this week. +1.

Re: optimizing pg_upgrade's once-in-each-database steps

2024-09-09 Thread Nathan Bossart
On Thu, Sep 05, 2024 at 01:32:34PM +0200, Daniel Gustafsson wrote: > I've read and tested through the latest version of this patchset and I think > it's ready to go in. Thanks for reviewing. I'm aiming to commit it later this week. > The one concern I have is that tasks can exit(1) on libpq > er

Re: optimizing pg_upgrade's once-in-each-database steps

2024-09-05 Thread Daniel Gustafsson
I've read and tested through the latest version of this patchset and I think it's ready to go in. The one concern I have is that tasks can exit(1) on libpq errors tearing down perfectly functional connections without graceful shutdown. Longer term I think it would make sense to add similar exit ha

Re: optimizing pg_upgrade's once-in-each-database steps

2024-09-03 Thread Nathan Bossart
On Wed, Sep 04, 2024 at 12:28:23AM +0100, Ilya Gladyshev wrote: > The fix looks right to me, but I got confused by the skip_wait and this > `if`: > > +    if (PQstatus(slot->conn) != CONNECTION_OK) > +    return; > > This branch checks connection status that hasn't been refres

Re: optimizing pg_upgrade's once-in-each-database steps

2024-09-03 Thread Ilya Gladyshev
On 01.09.2024 22:05, Nathan Bossart wrote: I think we can actually just use PQstatus() here. But furthermore, I think the way I was initiating connections was completely bogus. IIUC before calling PQconnectPoll() the first time, we should wait for a write indicator from select(), and then we s

Re: optimizing pg_upgrade's once-in-each-database steps

2024-09-03 Thread Nathan Bossart
In v12, I've moved the "queries" PQExpBuffer up to the UpgradeTask struct so that we don't need to rebuild it for every database. I think this patch set is in reasonable state, and I still plan to commit it this month. -- nathan >From 7b3a26bb8e418bdf1920f5fe9fe97afd2939d33a Mon Sep 17 00:00:00

Re: optimizing pg_upgrade's once-in-each-database steps

2024-09-01 Thread Nathan Bossart
On Sat, Aug 31, 2024 at 01:18:10AM +0100, Ilya Gladyshev wrote: > LGTM in general, but here are some final nitpicks: Thanks for reviewing. > + if (maxFd != 0) > + (void) select(maxFd + 1, &input_mask, &output_mask, > &except_mask, NULL); > > It´s a good idea to check for the ret

Re: optimizing pg_upgrade's once-in-each-database steps

2024-08-30 Thread Ilya Gladyshev
LGTM in general, but here are some final nitpicks: + if (maxFd != 0) + (void) select(maxFd + 1, &input_mask, &output_mask, &except_mask, NULL); It’s a good idea to check for the return value of select, in case it returns any errors. + dbs_complete++; +

Re: optimizing pg_upgrade's once-in-each-database steps

2024-08-28 Thread Nathan Bossart
I spent some time cleaning up names, comments, etc. Barring additional feedback, I'm planning to commit this stuff in the September commitfest so that it has plenty of time to bake in the buildfarm. -- nathan >From c38137cd2cbc45c348e8e144f864a62c08bd7b7f Mon Sep 17 00:00:00 2001 From: Nathan Bo

Re: optimizing pg_upgrade's once-in-each-database steps

2024-08-15 Thread Nathan Bossart
On Sat, Aug 10, 2024 at 10:35:46AM -0500, Nathan Bossart wrote: > Another option might be to combine all the queries for a task into a single > string and then send that in one PQsendQuery() call. That may be a simpler > way to eliminate the time between queries. I tried this out and didn't see a

Re: optimizing pg_upgrade's once-in-each-database steps

2024-08-10 Thread Nathan Bossart
On Sat, Aug 10, 2024 at 10:17:27AM -0500, Nathan Bossart wrote: > On Fri, Aug 09, 2024 at 04:06:16PM -0400, Corey Huinker wrote: >>> Furthermore, most of the callbacks should do almost nothing for a given >>> upgrade, and since pg_upgrade runs on the server, client/server round-trip >>> time should

Re: optimizing pg_upgrade's once-in-each-database steps

2024-08-10 Thread Nathan Bossart
On Fri, Aug 09, 2024 at 04:06:16PM -0400, Corey Huinker wrote: >> I'll admit I hadn't really considered pipelining, but I'm tempted to say >> that it's probably not worth the complexity. Not only do most of the tasks >> have only one step, but even tasks like the data types check are unlikely >> t

Re: optimizing pg_upgrade's once-in-each-database steps

2024-08-09 Thread Corey Huinker
> > I'll admit I hadn't really considered pipelining, but I'm tempted to say > that it's probably not worth the complexity. Not only do most of the tasks > have only one step, but even tasks like the data types check are unlikely > to require more than a few queries for upgrades from supported ver

Re: optimizing pg_upgrade's once-in-each-database steps

2024-08-09 Thread Nathan Bossart
On Thu, Aug 08, 2024 at 06:18:38PM -0400, Corey Huinker wrote: > I think the underlying mechanism is basically solid, but I have one > question: isn't this the ideal case for using libpq pipelining? That would > allow subsequent tasks to launch while the main loop slowly gets around to > clearing o

Re: optimizing pg_upgrade's once-in-each-database steps

2024-08-08 Thread Corey Huinker
On Tue, Aug 6, 2024 at 3:20 PM Nathan Bossart wrote: > On Sun, Aug 04, 2024 at 07:19:57PM +0100, Ilya Gladyshev wrote: > > -- End of review -- > > Thanks for the review. I've attempted to address all your feedback in v8 > of the patch set. I think the names could still use some work, but I > wa

Re: optimizing pg_upgrade's once-in-each-database steps

2024-08-06 Thread Nathan Bossart
On Sun, Aug 04, 2024 at 07:19:57PM +0100, Ilya Gladyshev wrote: > -- End of review -- Thanks for the review. I've attempted to address all your feedback in v8 of the patch set. I think the names could still use some work, but I wanted to get the main structure in place before trying to fix them.

Re: optimizing pg_upgrade's once-in-each-database steps

2024-08-04 Thread Ilya Gladyshev
On 01.08.2024 22:41, Nathan Bossart wrote: Here is a new patch set. Besides rebasing, I've added the recursive call to process_slot() mentioned in the quoted text, and I've added quite a bit of commentary to async.c. That's much better now, thanks! Here's my code review, note that I haven't t

Re: optimizing pg_upgrade's once-in-each-database steps

2024-08-01 Thread Nathan Bossart
On Thu, Aug 01, 2024 at 12:44:35PM -0500, Nathan Bossart wrote: > On Wed, Jul 31, 2024 at 10:55:33PM +0100, Ilya Gladyshev wrote: >> I like your idea of parallelizing these checks with async libpq API, thanks >> for working on it. The patch doesn't apply cleanly on master anymore, but >> I've rebas

Re: optimizing pg_upgrade's once-in-each-database steps

2024-08-01 Thread Nathan Bossart
On Wed, Jul 31, 2024 at 10:55:33PM +0100, Ilya Gladyshev wrote: > I like your idea of parallelizing these checks with async libpq API, thanks > for working on it. The patch doesn't apply cleanly on master anymore, but > I've rebased locally and taken it for a quick spin with a pg16 instance of > 10

Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-31 Thread Ilya Gladyshev
On 22.07.2024 21:07, Nathan Bossart wrote: On Fri, Jul 19, 2024 at 04:21:37PM -0500, Nathan Bossart wrote: However, while looking into this, I noticed that only one get_query callback (get_db_subscription_count()) actually customizes the generated query using information in the provided DbInfo

Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-26 Thread Nathan Bossart
On Thu, Jul 25, 2024 at 11:42:55AM +0200, Daniel Gustafsson wrote: >> On 25 Jul 2024, at 04:58, Nathan Bossart wrote: >> Here is just the live_check patch. I rebased it, gave it a commit message, >> and fixed a silly mistake. Barring objections or cfbot indigestion, I plan >> to commit this with

Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-25 Thread Daniel Gustafsson
> On 25 Jul 2024, at 04:58, Nathan Bossart wrote: > > On Mon, Jul 22, 2024 at 03:07:10PM -0500, Nathan Bossart wrote: >> Here is a new patch set. I've included the latest revision of the patch to >> fix get_db_subscription_count() from the other thread [0] as 0001 since I >> expect that to be co

Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-24 Thread Nathan Bossart
On Mon, Jul 22, 2024 at 03:07:10PM -0500, Nathan Bossart wrote: > Here is a new patch set. I've included the latest revision of the patch to > fix get_db_subscription_count() from the other thread [0] as 0001 since I > expect that to be committed soon. I've also moved the patch that moves the > "

Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-22 Thread Nathan Bossart
On Fri, Jul 19, 2024 at 04:21:37PM -0500, Nathan Bossart wrote: > However, while looking into this, I noticed that only one get_query > callback (get_db_subscription_count()) actually customizes the generated > query using information in the provided DbInfo. AFAICT we can do this > particular step

Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-19 Thread Nathan Bossart
On Thu, Jul 18, 2024 at 09:57:23AM +0200, Daniel Gustafsson wrote: >> On 17 Jul 2024, at 23:32, Nathan Bossart wrote: >> On Wed, Jul 17, 2024 at 11:16:59PM +0200, Daniel Gustafsson wrote: > >>> +static void >>> +dispatch_query(const ClusterInfo *cluster, AsyncSlot *slot, >>> >>> + pg_fr

Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-18 Thread Daniel Gustafsson
> On 17 Jul 2024, at 23:32, Nathan Bossart wrote: > On Wed, Jul 17, 2024 at 11:16:59PM +0200, Daniel Gustafsson wrote: >> +static void >> +dispatch_query(const ClusterInfo *cluster, AsyncSlot *slot, >> >> + pg_free(query); >> +} >> >> A minor point, perhaps fueled by me not having play

Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-17 Thread Nathan Bossart
On Wed, Jul 17, 2024 at 11:16:59PM +0200, Daniel Gustafsson wrote: > First reaction after a read-through is that this seems really cool, can't wait > to see how much v18 pg_upgrade will be over v17. I will do more testing and > review once back from vacation, below are some comments from reading w

Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-17 Thread Daniel Gustafsson
> On 9 Jul 2024, at 05:33, Nathan Bossart wrote: > The code is still very rough and nowhere near committable, but this at > least gets the patch set into the editing phase. First reaction after a read-through is that this seems really cool, can't wait to see how much v18 pg_upgrade will be over

Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-08 Thread Nathan Bossart
I finished parallelizing everything in pg_upgrade I could easily parallelize with the proposed async task API. There are a few remaining places where I could not easily convert to the new API for whatever reason. AFAICT those remaining places are either not showing up prominently in my tests, or t

Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-08 Thread Nathan Bossart
As I mentioned elsewhere [0], here's a first attempt at parallelizing the data type checks. I was worried that I might have to refactor Daniel's work in commit 347758b quite significantly, but I was able to avoid that by using a set of generic callbacks and providing each task step an index to the

Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-01 Thread Nathan Bossart
On Mon, Jul 01, 2024 at 03:58:16PM -0400, Robert Haas wrote: > On Mon, Jul 1, 2024 at 3:47 PM Nathan Bossart > wrote: >> 0001 introduces a new API for registering callbacks and running them in >> parallel on all databases in the cluster. This new system manages a set of >> "slots" that follow a

Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-01 Thread Robert Haas
On Mon, Jul 1, 2024 at 3:47 PM Nathan Bossart wrote: > 0001 introduces a new API for registering callbacks and running them in > parallel on all databases in the cluster. This new system manages a set of > "slots" that follow a simple state machine to asynchronously establish a > connection and r

Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-01 Thread Nathan Bossart
I figured I'd post what I have so far since this thread hasn't been updated in a while. The attached patches are still "proof-of-concept grade," but they are at least moving in the right direction (IMHO). The variable naming is still not great, and they are woefully undercommented, among other th

Re: optimizing pg_upgrade's once-in-each-database steps

2024-05-16 Thread Nathan Bossart
On Thu, May 16, 2024 at 05:09:55PM -0700, Jeff Davis wrote: > How much complexity do you avoid by using async instead of multiple > processes? If we didn't want to use async, my guess is we'd want to use threads to avoid complicated IPC. And if we followed pgbench's example for using threads, it

Re: optimizing pg_upgrade's once-in-each-database steps

2024-05-16 Thread Jeff Davis
On Thu, 2024-05-16 at 16:16 -0500, Nathan Bossart wrote: > I am posting this early to get thoughts on the general approach.  If > we > proceeded with this strategy, I'd probably create some generic > tooling that > each relevant step would provide a set of callback functions. The documentation sta

optimizing pg_upgrade's once-in-each-database steps

2024-05-16 Thread Nathan Bossart
A number of pg_upgrade steps require connecting to each database and running a query. When there are many databases, these steps are particularly time-consuming, especially since this is done sequentially in a single process. At a quick glance, I see the following such steps: * create_lo