On Wed, Mar 5, 2025 at 9:18 PM Andres Freund <and...@anarazel.de> wrote:
> Hi, > > On 2025-03-05 20:54:35 -0500, Corey Huinker wrote: > > It's been considered and not ruled out, with a "let's see how the simple > > thing works, first" approach. Considerations are: > > > > * pg_stats is keyed on schemaname + tablename (which can also be indexes) > > and we need to use that because of the security barrier > > I don't think that has to be a big issue, you can just make the the query > query multiple tables at once using an = ANY(ARRAY[]) expression or such. > I'm uncertain how we'd do that with (schemaname,tablename) pairs. Are you suggesting we back the joins from pg_stats to pg_namespace and pg_class and then filter by oids? > > * The stats data is kinda heavy (most common value lists, most common > > elements lists, esp for high stattargets), which would be a considerable > > memory impact and some of those stats might not even be needed (example, > > index stats for a table that is filtered out) > > Doesn't the code currently have this problem already? Afaict the stats are > currently all stored in memory inside pg_dump. > Each call to getAttributeStats() fetches the pg_stats for one and only one relation and then writes the SQL call to fout, then discards the result set once all the attributes of the relation are done. I don't think the query itself would be a problem, a query querying all the > required stats should probably use PQsetSingleRowMode() or > PQsetChunkedRowsMode(). That makes sense if we get the attribute stats from the result set in the order that we need them, and I don't know how we could possibly do that. We'd still need a table to bsearch() and that would be huge.