On Fri, Aug 09, 2024 at 04:06:16PM -0400, Corey Huinker wrote: >> I'll admit I hadn't really considered pipelining, but I'm tempted to say >> that it's probably not worth the complexity. Not only do most of the tasks >> have only one step, but even tasks like the data types check are unlikely >> to require more than a few queries for upgrades from supported versions. > > Can you point me to a complex multi-step task that you think wouldn't work > for pipelining? My skimming of the other patches all seemed to be one query > with one result set to be processed by one callback.
I think it would work fine. I'm just not sure it's worth it, especially for tasks that run one exactly one query in each connection. >> Furthermore, most of the callbacks should do almost nothing for a given >> upgrade, and since pg_upgrade runs on the server, client/server round-trip >> time should be pretty low. > > To my mind, that makes pipelining make more sense, you throw out N queries, > most of which are trivial, and by the time you cycle back around and start > digesting result sets via callbacks, more of the queries have finished > because they were waiting on the query ahead of them in the pipeline, not > waiting on a callback to finish consuming its assigned result set and then > launching the next task query. My assumption is that the "waiting for a callback before launching the next query" time will typically be pretty short in practice. I could try measuring it... >> Perhaps pipelining would make more sense if we consolidated the tasks a bit >> better, but when I last looked into that, I didn't see a ton of great >> opportunities that would help anything except for upgrades from really old >> versions. Even then, I'm not sure if pipelining is worth it. > > I think you'd want to do the opposite of consolidating the tasks. If > anything, you'd want to break them down in known single-query operations, > and if the callback function for one of them happens to queue up a > subsequent query (with subsequent callback) then so be it. By "consolidating," I mean combining tasks into fewer tasks with additional steps. This would allow us to reuse connections instead of creating N connections for every single query. If we used a task per query, I'd expect pipelining to provide zero benefit. -- nathan