Hi,
On 2025-03-05 23:04:29 -0500, Corey Huinker wrote: > > > I'm uncertain how we'd do that with (schemaname,tablename) pairs. Are you > > > suggesting we back the joins from pg_stats to pg_namespace and pg_class > > and > > > then filter by oids? > > > > I was thinking of one query per schema or something like that. But yea, a > > query to pg_namespace and pg_class wouldn't be a problem if we did it far > > fewer times than before. Or you could put the list of catalogs / tables > > to > > be queried into an unnest() with two arrays or such. > > > > Not sure how good the query plan for that would be, but it may be worth > > looking at. > > > > Ok, so we're willing to take the pg_class/pg_namespace join hit for one or > a handful of queries, good to know. It's a tradeoff that needs to be evaluated. But I'd be rather surprised if it weren't faster to run one query with the additional joins than hundreds of queries without them. > > > Each call to getAttributeStats() fetches the pg_stats for one and only > > one > > > relation and then writes the SQL call to fout, then discards the result > > set > > > once all the attributes of the relation are done. > > > > I don't think that's true. For one my example demonstrated that it > > increases > > the peak memory usage substantially. That'd not be the case if the data was > > just written out to stdout or such. > > > > Looking at the code confirms that. The ArchiveEntry() in > > dumpRelationStats() > > is never freed, afaict. And ArchiveEntry() strdups ->createStmt, which > > contains the "SELECT pg_restore_attribute_stats(...)". > > > > Pardon my inexperience, but aren't the ArchiveEntry records needed right up > until the program's run? s/the/the end of the/? > If there's value in freeing them, why isn't it being done already? What > other thing would consume this freed memory? I'm not saying that they can be freed, they can't right now. My point is just that we *already* keep all the stats in memory, so the fact that fetching all stats in a single query would also require keeping them in memory is not an issue. But TBH, I do wonder how much the current memory usage of the statistics dump/restore support is going to bite us. In some cases this will dramatically increase pg_dump/pg_upgrade's memory usage, my tests were with tiny amounts of data and very simple scalar datatypes and you already could see a substantial increase. With something like postgis or even just a lot of jsonb columns this is going to be way worse. Greetings, Andres Freund