On Tue, 30 Jul 2024 at 10:14, Tom Lane <t...@sss.pgh.pa.us> wrote: > ----- > for (my $i = 0; $i < 100; $i++) > { > print "CREATE TABLE test_inh_check$i (\n"; > for (my $j = 0; $j < 1000; $j++) > { > print "a$j float check (a$j > 10.2),\n"; > } > print "b float);\n"; > print "CREATE TABLE test_inh_check_child$i() > INHERITS(test_inh_check$i);\n"; > } > ----- > > On my development machine, it takes over 14 minutes to pg_upgrade > this, and it turns out that that time is largely spent in column > name de-duplication while deparsing the CHECK constraints. The > attached patch reduces that to about 3m45s.
I think this is worth doing. Reducing the --schema-only time in pg_dump is a worthy goal to reduce downtime during upgrades. I looked at the patch and tried it out. I wondered about the choice of 32 as the cut-off point so decided to benchmark using the attached script. Here's an extract from the attached results: Patched with 10 child tables pg_dump for 16 columns real 0m0.068s pg_dump for 31 columns real 0m0.080s pg_dump for 32 columns real 0m0.083s This gives me what I'd expect to see. I wanted to ensure the point where you're switching to the hashing method was about the right place. It seems to be, at least for my test. The performance looks good too: 10 tables: master: pg_dump for 1024 columns real 0m23.053s patched: pg_dump for 1024 columns real 0m1.573s 100 tables: master: pg_dump for 1024 columns real 3m29.857s patched: pg_dump for 1024 columns real 0m23.053s Perhaps you don't think it's worth the additional complexity, but I see that in both locations you're calling build_colinfo_names_hash(), it's done just after a call to expand_colnames_array_to(). I wondered if it was worthwhile unifying both of those functions maybe with a new name so that you don't need to loop over the always NULL element of the colnames[] array when building the hash table. This is likely quite a small overhead compared to the quadratic search you've removed, so it might not move the needle any. I just wanted to point it out as I've little else I can find to comment on. David
tables=100 dbname=postgres for cols in 2 4 8 16 31 32 64 128 256 512 1024 do psql -c "drop table parent cascade;" $dbname &> /dev/null sql="create table parent (" for i in $(seq 1 $cols) do sql="${sql}col$i int check (col$i > 0)" if [ $i -lt $cols ]; then sql="${sql}," fi done sql="${sql});" psql -c "$sql" $dbname &> /dev/null for i in $(seq 1 $tables) do psql -c "create table child$i () inherits (parent);" $dbname > /dev/null done echo -n "pg_dump for $cols columns " { time pg_dump $dbname > /dev/null; } |& grep real done
## AMD3990x With tables=10 master drowley@amd3990x:~$ ./pgdump_bench.sh pg_dump for 2 columns real 0m0.106s pg_dump for 4 columns real 0m0.075s pg_dump for 8 columns real 0m0.077s pg_dump for 16 columns real 0m0.083s pg_dump for 31 columns real 0m0.100s pg_dump for 32 columns real 0m0.099s pg_dump for 64 columns real 0m0.130s pg_dump for 128 columns real 0m0.208s pg_dump for 256 columns real 0m0.587s pg_dump for 512 columns real 0m3.227s pg_dump for 1024 columns real 0m23.053s patched drowley@amd3990x:~$ ./pgdump_bench.sh pg_dump for 2 columns real 0m0.109s pg_dump for 4 columns real 0m0.054s pg_dump for 8 columns real 0m0.057s pg_dump for 16 columns real 0m0.068s pg_dump for 31 columns real 0m0.080s pg_dump for 32 columns real 0m0.083s pg_dump for 64 columns real 0m0.105s pg_dump for 128 columns real 0m0.153s pg_dump for 256 columns real 0m0.272s pg_dump for 512 columns real 0m0.558s pg_dump for 1024 columns real 0m1.573s ## AMD3990x With tables=100 master drowley@amd3990x:~$ ./pgdump_bench.sh pg_dump for 2 columns real 0m0.150s pg_dump for 4 columns real 0m0.079s pg_dump for 8 columns real 0m0.095s pg_dump for 16 columns real 0m0.112s pg_dump for 31 columns real 0m0.134s pg_dump for 32 columns real 0m0.136s pg_dump for 64 columns real 0m0.278s pg_dump for 128 columns real 0m0.817s pg_dump for 256 columns real 0m3.998s pg_dump for 512 columns real 0m27.443s pg_dump for 1024 columns real 3m29.857s patched drowley@amd3990x:~$ ./pgdump_bench.sh pg_dump for 2 columns real 0m0.106s pg_dump for 4 columns real 0m0.075s pg_dump for 8 columns real 0m0.077s pg_dump for 16 columns real 0m0.083s pg_dump for 31 columns real 0m0.100s pg_dump for 32 columns real 0m0.099s pg_dump for 64 columns real 0m0.130s pg_dump for 128 columns real 0m0.208s pg_dump for 256 columns real 0m0.587s pg_dump for 512 columns real 0m3.227s pg_dump for 1024 columns real 0m23.053s