Hi,

On 2025-03-05 20:54:35 -0500, Corey Huinker wrote:
> It's been considered and not ruled out, with a "let's see how the simple
> thing works, first" approach. Considerations are:
>
> * pg_stats is keyed on schemaname + tablename (which can also be indexes)
> and we need to use that because of the security barrier

I don't think that has to be a big issue, you can just make the the query
query multiple tables at once using an = ANY(ARRAY[]) expression or such.


> * The stats data is kinda heavy (most common value lists, most common
> elements lists, esp for high stattargets), which would be a considerable
> memory impact and some of those stats might not even be needed (example,
> index stats for a table that is filtered out)

Doesn't the code currently have this problem already? Afaict the stats are
currently all stored in memory inside pg_dump.

$ for opt in '' --no-statistics; do echo "using option $opt"; for dbname in 
pgbench_part_100 pgbench_part_1000 pgbench_part_10000; do echo $dbname; 
/usr/bin/time -f 'Max RSS kB: %M' ./src/bin/pg_dump/pg_dump --no-data 
--quote-all-identifiers --no-sync --no-data $opt $dbname -Fp > 
/dev/null;done;done

using option
pgbench_part_100
Max RSS kB: 12780
pgbench_part_1000
Max RSS kB: 22700
pgbench_part_10000
Max RSS kB: 124224
using option --no-statistics
pgbench_part_100
Max RSS kB: 12648
pgbench_part_1000
Max RSS kB: 19124
pgbench_part_10000
Max RSS kB: 85068


I don't think the query itself would be a problem, a query querying all the
required stats should probably use PQsetSingleRowMode() or
PQsetChunkedRowsMode().

Greetings,

Andres Freund


Reply via email to