On Tue, Jun 4, 2024 at 4:28 AM Michael Paquier <mich...@paquier.xyz> wrote:
> On Fri, Apr 26, 2024 at 06:38:22PM +0530, Ashutosh Bapat wrote: > > Some points for discussion: > > 1. The test still hardcodes the diffs between two dumps. Haven't found a > > better way to do it. I did consider removing the problematic objects from > > the regression database but thought against it since we would lose some > > coverage. > > > > 2. The new code tests dump and restore of just the regression database > and > > does not use pg_dumpall like pg_upgrade. Should it instead perform > > pg_dumpall? I decided against it since a. we are interested in dumping > and > > restoring objects left behind by regression, b. I didn't find a way to > > provide the format option to pg_dumpall. The test could be enhanced to > use > > different dump formats. > > > > I have added it to the next commitfest. > > https://commitfest.postgresql.org/48/4956/ > > Ashutosh and I have discussed this patch a bit last week. Here is a > short summary of my input, after I understood what is going on. > > + # We could avoid this by dumping the database loaded from original > dump. > + # But that would change the state of the objects as left behind by > the > + # regression. > + my $expected_diff = " -- > + CREATE TABLE public.gtestxx_4 ( > +- b integer, > +- a integer NOT NULL > ++ a integer NOT NULL, > ++ b integer > + ) > [...] > + my ($stdout, $stderr) = > + run_command([ 'diff', '-u', $dump4_file, $dump5_file]); > + # Clear file names, line numbers from the diffs; those are not > going to > + # remain the same always. Also clear empty lines and normalize new > line > + # characters across platforms. > + $stdout =~ s/^\@\@.*$//mg; > + $stdout =~ s/^.*$dump4_file.*$//mg; > + $stdout =~ s/^.*$dump5_file.*$//mg; > + $stdout =~ s/^\s*\n//mg; > + $stdout =~ s/\r\n/\n/g; > + $expected_diff =~ s/\r\n/\n/g; > + is($stdout, $expected_diff, 'old and new dumps match after dump > and restore'); > +} > > I am not a fan of what this patch does, adding the knowledge related > to the dump filtering within 002_pg_upgrade.pl. Please do not take > me wrong, I am not against the idea of adding that within this > pg_upgrade test to save from one full cycle of `make check` to check > the consistency of the dump. My issue is that this logic should be > externalized, and it should be in fewer lines of code. > For the externalization part, Ashutosh and I considered a few ideas, > but one that we found tempting is to create a small .pm, say named > AdjustDump.pm. This would share some rules with the existing > AdjustUpgrade.pm, which would be fine IMO even if there is a small > overlap, documenting the dependency between each module. That makes > the integration with the buildfarm much simpler by not creating more > dependencies with the modules shared between core and the buildfarm > code. For the "shorter" part, one idea that I had is to apply to the > dump a regexp that wipes out the column definitions within the > parenthesis, keeping around the CREATE TABLE and any other attributes > not impacted by the reordering. All that should be documented in the > module, of course. > Thanks for the suggestion. I didn't understand the dependency with the buildfarm module. Will the new module be used in buildfarm separately? I will work on this soon. Thanks for changing CF entry to WoA. > > Another thing would be to improve the backend so as we are able to > a better support for physical column ordering, which would, I assume > (and correct me if I'm wrong!), prevent the reordering of the > attributes like in this inheritance case. But that would not address > the case of dumps taken from older versions with a new version of > pg_dump, which is something that may be interesting to have more tests > for in the long-term. Overall a module sounds like a better solution. > Changing the physical order of column of a child table based on the inherited table seems intentional as per MergeAttributes(). That logic looks sane by itself. In binary mode pg_dump works very hard to retain the column order by issuing UPDATE commands against catalog tables. I don't think mimicking that behaviour is the right choice for non-binary dump. I agree with your conclusion that we fix it in by fixing the diffs. The code to do that will be part of a separate module. -- Best Wishes, Ashutosh Bapat