On Tue, 30 Jul 2024 at 10:14, Tom Lane <[email protected]> wrote:
> -----
> for (my $i = 0; $i < 100; $i++)
> {
> print "CREATE TABLE test_inh_check$i (\n";
> for (my $j = 0; $j < 1000; $j++)
> {
> print "a$j float check (a$j > 10.2),\n";
> }
> print "b float);\n";
> print "CREATE TABLE test_inh_check_child$i()
> INHERITS(test_inh_check$i);\n";
> }
> -----
>
> On my development machine, it takes over 14 minutes to pg_upgrade
> this, and it turns out that that time is largely spent in column
> name de-duplication while deparsing the CHECK constraints. The
> attached patch reduces that to about 3m45s.
I think this is worth doing. Reducing the --schema-only time in
pg_dump is a worthy goal to reduce downtime during upgrades.
I looked at the patch and tried it out. I wondered about the choice of
32 as the cut-off point so decided to benchmark using the attached
script. Here's an extract from the attached results:
Patched with 10 child tables
pg_dump for 16 columns real 0m0.068s
pg_dump for 31 columns real 0m0.080s
pg_dump for 32 columns real 0m0.083s
This gives me what I'd expect to see. I wanted to ensure the point
where you're switching to the hashing method was about the right
place. It seems to be, at least for my test.
The performance looks good too:
10 tables:
master: pg_dump for 1024 columns real 0m23.053s
patched: pg_dump for 1024 columns real 0m1.573s
100 tables:
master: pg_dump for 1024 columns real 3m29.857s
patched: pg_dump for 1024 columns real 0m23.053s
Perhaps you don't think it's worth the additional complexity, but I
see that in both locations you're calling build_colinfo_names_hash(),
it's done just after a call to expand_colnames_array_to(). I wondered
if it was worthwhile unifying both of those functions maybe with a new
name so that you don't need to loop over the always NULL element of
the colnames[] array when building the hash table. This is likely
quite a small overhead compared to the quadratic search you've
removed, so it might not move the needle any. I just wanted to point
it out as I've little else I can find to comment on.
David
tables=100
dbname=postgres
for cols in 2 4 8 16 31 32 64 128 256 512 1024
do
psql -c "drop table parent cascade;" $dbname &> /dev/null
sql="create table parent ("
for i in $(seq 1 $cols)
do
sql="${sql}col$i int check (col$i > 0)"
if [ $i -lt $cols ]; then
sql="${sql},"
fi
done
sql="${sql});"
psql -c "$sql" $dbname &> /dev/null
for i in $(seq 1 $tables)
do
psql -c "create table child$i () inherits (parent);" $dbname >
/dev/null
done
echo -n "pg_dump for $cols columns "
{ time pg_dump $dbname > /dev/null; } |& grep real
done
## AMD3990x With tables=10
master
drowley@amd3990x:~$ ./pgdump_bench.sh
pg_dump for 2 columns real 0m0.106s
pg_dump for 4 columns real 0m0.075s
pg_dump for 8 columns real 0m0.077s
pg_dump for 16 columns real 0m0.083s
pg_dump for 31 columns real 0m0.100s
pg_dump for 32 columns real 0m0.099s
pg_dump for 64 columns real 0m0.130s
pg_dump for 128 columns real 0m0.208s
pg_dump for 256 columns real 0m0.587s
pg_dump for 512 columns real 0m3.227s
pg_dump for 1024 columns real 0m23.053s
patched
drowley@amd3990x:~$ ./pgdump_bench.sh
pg_dump for 2 columns real 0m0.109s
pg_dump for 4 columns real 0m0.054s
pg_dump for 8 columns real 0m0.057s
pg_dump for 16 columns real 0m0.068s
pg_dump for 31 columns real 0m0.080s
pg_dump for 32 columns real 0m0.083s
pg_dump for 64 columns real 0m0.105s
pg_dump for 128 columns real 0m0.153s
pg_dump for 256 columns real 0m0.272s
pg_dump for 512 columns real 0m0.558s
pg_dump for 1024 columns real 0m1.573s
## AMD3990x With tables=100
master
drowley@amd3990x:~$ ./pgdump_bench.sh
pg_dump for 2 columns real 0m0.150s
pg_dump for 4 columns real 0m0.079s
pg_dump for 8 columns real 0m0.095s
pg_dump for 16 columns real 0m0.112s
pg_dump for 31 columns real 0m0.134s
pg_dump for 32 columns real 0m0.136s
pg_dump for 64 columns real 0m0.278s
pg_dump for 128 columns real 0m0.817s
pg_dump for 256 columns real 0m3.998s
pg_dump for 512 columns real 0m27.443s
pg_dump for 1024 columns real 3m29.857s
patched
drowley@amd3990x:~$ ./pgdump_bench.sh
pg_dump for 2 columns real 0m0.106s
pg_dump for 4 columns real 0m0.075s
pg_dump for 8 columns real 0m0.077s
pg_dump for 16 columns real 0m0.083s
pg_dump for 31 columns real 0m0.100s
pg_dump for 32 columns real 0m0.099s
pg_dump for 64 columns real 0m0.130s
pg_dump for 128 columns real 0m0.208s
pg_dump for 256 columns real 0m0.587s
pg_dump for 512 columns real 0m3.227s
pg_dump for 1024 columns real 0m23.053s