On Tue, 30 Jul 2024 at 10:14, Tom Lane <t...@sss.pgh.pa.us> wrote:
> -----
> for (my $i = 0; $i < 100; $i++)
> {
>         print "CREATE TABLE test_inh_check$i (\n";
>         for (my $j = 0; $j < 1000; $j++)
>         {
>                 print "a$j float check (a$j > 10.2),\n";
>         }
>         print "b float);\n";
>         print "CREATE TABLE test_inh_check_child$i() 
> INHERITS(test_inh_check$i);\n";
> }
> -----
>
> On my development machine, it takes over 14 minutes to pg_upgrade
> this, and it turns out that that time is largely spent in column
> name de-duplication while deparsing the CHECK constraints.  The
> attached patch reduces that to about 3m45s.

I think this is worth doing. Reducing the --schema-only time in
pg_dump is a worthy goal to reduce downtime during upgrades.

I looked at the patch and tried it out. I wondered about the choice of
32 as the cut-off point so decided to benchmark using the attached
script. Here's an extract from the attached results:

Patched with 10 child tables
pg_dump for 16 columns real     0m0.068s
pg_dump for 31 columns real     0m0.080s
pg_dump for 32 columns real     0m0.083s

This gives me what I'd expect to see. I wanted to ensure the point
where you're switching to the hashing method was about the right
place. It seems to be, at least for my test.

The performance looks good too:

10 tables:
master: pg_dump for 1024 columns real   0m23.053s
patched: pg_dump for 1024 columns real   0m1.573s

100 tables:
master: pg_dump for 1024 columns real   3m29.857s
patched: pg_dump for 1024 columns real   0m23.053s

Perhaps you don't think it's worth the additional complexity, but I
see that in both locations you're calling build_colinfo_names_hash(),
it's done just after a call to expand_colnames_array_to(). I wondered
if it was worthwhile unifying both of those functions maybe with a new
name so that you don't need to loop over the always NULL element of
the colnames[] array when building the hash table. This is likely
quite a small overhead compared to the quadratic search you've
removed, so it might not move the needle any. I just wanted to point
it out as I've little else I can find to comment on.

David
tables=100
dbname=postgres


for cols in 2 4 8 16 31 32 64 128 256 512 1024 
do
        psql -c "drop table parent cascade;" $dbname &> /dev/null
        sql="create table parent ("
        for i in $(seq 1 $cols)
        do
                sql="${sql}col$i int check (col$i > 0)"
                if [ $i -lt $cols ]; then
                        sql="${sql},"
                fi
        done
        sql="${sql});"

        psql -c "$sql" $dbname &> /dev/null

        for i in $(seq 1 $tables)
        do
                psql -c "create table child$i () inherits (parent);" $dbname > 
/dev/null
        done

        echo -n "pg_dump for $cols columns "
        { time pg_dump $dbname > /dev/null; } |& grep real
        
done

## AMD3990x  With tables=10

master
drowley@amd3990x:~$ ./pgdump_bench.sh
pg_dump for 2 columns real      0m0.106s
pg_dump for 4 columns real      0m0.075s
pg_dump for 8 columns real      0m0.077s
pg_dump for 16 columns real     0m0.083s
pg_dump for 31 columns real     0m0.100s
pg_dump for 32 columns real     0m0.099s
pg_dump for 64 columns real     0m0.130s
pg_dump for 128 columns real    0m0.208s
pg_dump for 256 columns real    0m0.587s
pg_dump for 512 columns real    0m3.227s
pg_dump for 1024 columns real   0m23.053s

patched
drowley@amd3990x:~$ ./pgdump_bench.sh
pg_dump for 2 columns real      0m0.109s
pg_dump for 4 columns real      0m0.054s
pg_dump for 8 columns real      0m0.057s
pg_dump for 16 columns real     0m0.068s
pg_dump for 31 columns real     0m0.080s
pg_dump for 32 columns real     0m0.083s
pg_dump for 64 columns real     0m0.105s
pg_dump for 128 columns real    0m0.153s
pg_dump for 256 columns real    0m0.272s
pg_dump for 512 columns real    0m0.558s
pg_dump for 1024 columns real   0m1.573s


## AMD3990x With tables=100

master
drowley@amd3990x:~$ ./pgdump_bench.sh
pg_dump for 2 columns real      0m0.150s
pg_dump for 4 columns real      0m0.079s
pg_dump for 8 columns real      0m0.095s
pg_dump for 16 columns real     0m0.112s
pg_dump for 31 columns real     0m0.134s
pg_dump for 32 columns real     0m0.136s
pg_dump for 64 columns real     0m0.278s
pg_dump for 128 columns real    0m0.817s
pg_dump for 256 columns real    0m3.998s
pg_dump for 512 columns real    0m27.443s
pg_dump for 1024 columns real   3m29.857s

patched
drowley@amd3990x:~$ ./pgdump_bench.sh
pg_dump for 2 columns real      0m0.106s
pg_dump for 4 columns real      0m0.075s
pg_dump for 8 columns real      0m0.077s
pg_dump for 16 columns real     0m0.083s
pg_dump for 31 columns real     0m0.100s
pg_dump for 32 columns real     0m0.099s
pg_dump for 64 columns real     0m0.130s
pg_dump for 128 columns real    0m0.208s
pg_dump for 256 columns real    0m0.587s
pg_dump for 512 columns real    0m3.227s
pg_dump for 1024 columns real   0m23.053s

Reply via email to