On Fri, Apr 26, 2024 at 06:38:22PM +0530, Ashutosh Bapat wrote: > Some points for discussion: > 1. The test still hardcodes the diffs between two dumps. Haven't found a > better way to do it. I did consider removing the problematic objects from > the regression database but thought against it since we would lose some > coverage. > > 2. The new code tests dump and restore of just the regression database and > does not use pg_dumpall like pg_upgrade. Should it instead perform > pg_dumpall? I decided against it since a. we are interested in dumping and > restoring objects left behind by regression, b. I didn't find a way to > provide the format option to pg_dumpall. The test could be enhanced to use > different dump formats. > > I have added it to the next commitfest. > https://commitfest.postgresql.org/48/4956/
Ashutosh and I have discussed this patch a bit last week. Here is a short summary of my input, after I understood what is going on. + # We could avoid this by dumping the database loaded from original dump. + # But that would change the state of the objects as left behind by the + # regression. + my $expected_diff = " -- + CREATE TABLE public.gtestxx_4 ( +- b integer, +- a integer NOT NULL ++ a integer NOT NULL, ++ b integer + ) [...] + my ($stdout, $stderr) = + run_command([ 'diff', '-u', $dump4_file, $dump5_file]); + # Clear file names, line numbers from the diffs; those are not going to + # remain the same always. Also clear empty lines and normalize new line + # characters across platforms. + $stdout =~ s/^\@\@.*$//mg; + $stdout =~ s/^.*$dump4_file.*$//mg; + $stdout =~ s/^.*$dump5_file.*$//mg; + $stdout =~ s/^\s*\n//mg; + $stdout =~ s/\r\n/\n/g; + $expected_diff =~ s/\r\n/\n/g; + is($stdout, $expected_diff, 'old and new dumps match after dump and restore'); +} I am not a fan of what this patch does, adding the knowledge related to the dump filtering within 002_pg_upgrade.pl. Please do not take me wrong, I am not against the idea of adding that within this pg_upgrade test to save from one full cycle of `make check` to check the consistency of the dump. My issue is that this logic should be externalized, and it should be in fewer lines of code. For the externalization part, Ashutosh and I considered a few ideas, but one that we found tempting is to create a small .pm, say named AdjustDump.pm. This would share some rules with the existing AdjustUpgrade.pm, which would be fine IMO even if there is a small overlap, documenting the dependency between each module. That makes the integration with the buildfarm much simpler by not creating more dependencies with the modules shared between core and the buildfarm code. For the "shorter" part, one idea that I had is to apply to the dump a regexp that wipes out the column definitions within the parenthesis, keeping around the CREATE TABLE and any other attributes not impacted by the reordering. All that should be documented in the module, of course. Another thing would be to improve the backend so as we are able to a better support for physical column ordering, which would, I assume (and correct me if I'm wrong!), prevent the reordering of the attributes like in this inheritance case. But that would not address the case of dumps taken from older versions with a new version of pg_dump, which is something that may be interesting to have more tests for in the long-term. Overall a module sounds like a better solution. -- Michael
signature.asc
Description: PGP signature