Hi, On 2025-02-21 15:49:10 -0500, Tom Lane wrote: > Andres Freund <and...@anarazel.de> writes: > > Which to me rather strongly suggests pg_dump has gotten a *lot* slower with > > this change. > > Well, it's doing strictly more work, so somewhat slower is to be > expected.
Yea, if we had talked a few percent, I'd not have balked. It's more like 2-4x though and it'll probably be worse when not connecting over local TCP connections. This is a slowdown to the point that the downtime for pg_upgrade will be substantially lengthened compared to before. But I think we should be able to address that to a large degree. > In a quick look at the committed patch, it doesn't seem to have > used any of the speedup strategies we applied to pg_dump a couple > of years ago. One or the other of these should help: > > * Issue a single query to fetch stats from every table we're dumping > * Set up a prepared query to avoid re-planning the per-table query > (compare be85727a3) > > I'm not sure how workable the first of these would be though. > It's not hard to imagine it blowing out pg_dump's memory usage > for a DB with a lot of tables and high default_statistics_target. We could presumably do the one-query approach for the relation stats, that's just three integers. That way we'd at least not end up with two queries for each table (for pg_class.reltuples etc and for pg_stats). I guess the memory usage could also be addressed by using COPY, but that's probably unrealistically complicated. > The second one should be relatively downside-free. Yea. And at least with pg_dump running locally that's where a lot of the CPU time is spent. Remotely doing lots of one-by-one queries will hurt even with prepared statements though. One way to largely address that would be to use a prepared statement combined with libpq pipelining. That still has separate executor startup etc, but I think it should still reduce the cost to a point where we don't care anymore. Greetings, Andres Freund