Alvaro, * Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote: > Stephen Frost wrote: > > Alvaro, Tom, > > > > * Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote: > > > > Crazy idea: maybe a large fraction of that test could be replaced with > > > comparisons of the "pg_restore -l" output file rather than pg_dump's > > > text output (i.e. the TOC entry for each object, rather than the > > > object's textual representation.) Sounds easier in code than current > > > implementation. Separately, verify that textual representation for each > > > TOC entry type is what we expect. > > > > I'm not sure how that's different..? We do check that the textual > > representation is what we expect, both directly from pg_dump and from > > pg_restore output, and using the exact same code to check both (which I > > generally think is a good thing since we want the results from both to > > more-or-less match up). What you're proposing here sounds like we're > > throwing that away in favor of keeping all the same code to test the > > textual representation but then only checking for listed contents from > > pg_restore instead of checking that the textual representation is > > correct. That doesn't actually reduce the amount of code though.. > > Well, the current implementation compares a dozen of pg_dump output text > files, three hundred lines apiece, against a thousand of regexes (give > or take). Whenever there is a mismatch, what you get is "this regexp > failed to match <three hundred lines>" (or sometimes "matched when it > should have not"), so looking for the mismatch is quite annoying.
Yeah, the big output isn't fun, I agree with that. > My proposal is that instead of looking at three hundred lines, you'd > look for 50 lines of `pg_restore -l` output -- is element XYZ in there > or not. Quite a bit simpler for the guy adding a new test. This tests > combinations of pg_dump switches: are we dumping the right set of > objects. That might be possible to do, though I'm not sure that it'd really be only 50 lines. > *Separately* test that each individual TOC entry type ("table data", > "index", "tablespace") is dumped as this or that SQL command, where you > create a separate dump file for each object type. So you match a single > TOC entry to a dozen lines of SQL, half of them comments -- pretty easy > to see where a mismatch is. Yikes. If you're thinking that we would call pg_restore independently for each object, I'd be worried about the runtime increasing quite a bit. There's also a few cases where we check dependencies between objects that we'd need to be mindful of, though those we might be able to take care of, if we can keep the runtime down. Certainly, part of the way the existing code works was specifically to try and minimize the runtime. Perhaps the approach taken of minimizing the invocations and building a bunch of regexps to run against the resulting output is more expensive than running pg_restore a bunch of times, but I'd want to see that to be sure as some really simple timings locally on a 6.5k -Fc dump file seems to take 30-40ms just to produce the TOC. This also doesn't really help with the big perl hash, but these two ideas don't seem to conflict, so perhaps we could possibly still tackle the big perl hash with the approach I described up-thread and also find a way to implement this, if it doesn't cause the regression tests to be too much slower. Thanks! Stephen
signature.asc
Description: PGP signature