Forking https://www.postgresql.org/message-id/20210328231433.gi15...@telsasoft.com
I gave suggestion how to reduce the "lines of diff" metric almost to nothing, allowing a very small "fudge factor", and which I think makes this a pretty good metric rather than a passable one. Thoughts ? On Sun, Mar 28, 2021 at 06:14:33PM -0500, Justin Pryzby wrote: > On Sun, Mar 28, 2021 at 04:48:29PM -0400, Andrew Dunstan wrote: > > Nothing is hidden here - the diffs are reported, see for example > > <https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=crake&dt=2021-03-28%2015%3A37%3A07&stg=xversion-upgrade-REL9_4_STABLE-HEAD> > > What we're comparing here is target pg_dumpall against the original > > source vs target pg_dumpall against the upgraded source. > > The command being run is: > > https://github.com/PGBuildFarm/client-code/blob/master/PGBuild/Modules/TestUpgradeXversion.pm#L610 > system( "diff -I '^-- ' -u $upgrade_loc/origin-$oversion.sql " > . "$upgrade_loc/converted-$oversion-to-$this_branch.sql " > . "> $upgrade_loc/dumpdiff-$oversion 2>&1"); > ... > my $difflines = `wc -l < $upgrade_loc/dumpdiff-$oversion`; > > where -I means: --ignore-matching-lines=RE > > I think wc -l should actually be grep -c '^[-+]' > otherwise context lines count for as much as diff lines. > You could write that with diff -U0 |wc -l, except the context is useful to > humans. > > With some more effort, the number of lines of diff can be very small, allowing > a smaller fudge factor. > > For upgrade from v10: > time make -C src/bin/pg_upgrade check oldsrc=`pwd`/10 > oldbindir=`pwd`/10/tmp_install/usr/local/pgsql/bin > > $ diff -u src/bin/pg_upgrade/tmp_check/dump1.sql > src/bin/pg_upgrade/tmp_check/dump2.sql |wc -l > 622 > > Without context: > $ diff -u src/bin/pg_upgrade/tmp_check/dump1.sql > src/bin/pg_upgrade/tmp_check/dump2.sql |grep -c '^[-+]' > 142 > > Without comments: > $ diff -I '^-- ' -u src/bin/pg_upgrade/tmp_check/dump1.sql > src/bin/pg_upgrade/tmp_check/dump2.sql |grep -c '^[-+]' > 130 > > Without SET default stuff: > diff -I '^$' -I "SET default_table_access_method = heap;" -I "^SET > default_toast_compression = 'pglz';$" -I '^-- ' -u > /home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/dump1.sql > /home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/dump2.sql |less |grep > -c '^[-+]' > 117 > > Without trigger function call noise: > diff -I "^CREATE TRIGGER [_[:alnum:]]\+ .* FOR EACH \(ROW\|STATEMENT\) > EXECUTE \(PROCEDURE\|FUNCTION\)" -I '^$' -I "SET default_table_access_method > = heap;" -I "^SET default_toast_compression = 'pglz';$" -I '^-- ' -u > /home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/dump1.sql > /home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/dump2.sql |grep -c > '^[-+]' > 11 > > Maybe it's important not to totally ignore that, and instead perhaps clean up > the known/accepted changes like s/FUNCTION/PROCEDURE/: > > </home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/dump2.sql sed > '/^CREATE TRIGGER/s/FUNCTION/PROCEDURE/' |diff -I '^$' -I "SET > default_table_access_method = heap;" -I "^SET default_toast_compression = > 'pglz';$" -I '^-- ' -u > /home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/dump1.sql - |grep -c > '^[-+]' > 11 > > It seems weird that we don't quote "heap" but we quote tablespaces and not > toast compression methods.