On Tue, Mar 3, 2015 at 7:49 AM, Andres Freund <and...@2ndquadrant.com> wrote:
> Hi, > > I've regularly wished we had automated tests that setup HS and then > compare primary/standby at the end to verify replay worked > correctly. > > Heikki's page comparison tools deals with some of that verification, but > it's really quite expensive and doesn't care about runtime only > differences. I.e. it doesn't test HS at all. > > I every now and then run installcheck against a primary, verify that > replay works without errors, and then compare pg_dumpall from both > clusters. Unfortunately that currently requires hand inspection of > dumps, there are differences like: > -SELECT pg_catalog.setval('default_seq', 1, true); > +SELECT pg_catalog.setval('default_seq', 33, true); > > The reason these differences is that the primary increases the > sequence's last_value by 1, but temporarily sets it to +SEQ_LOG_VALS > before XLogInsert(). So the two differ. > > Does anybody have a good idea how to get rid of that difference? One way > to do that would be to log the value the standby is sure to have - but > that's not entirely trivial. > > I'd very much like to add a automated test like this to the tree, but I > don't see wa way to do that sanely without a comparison tool... > Couldn't we just arbitrarily exclude sequence internal states from the comparison? That wouldn't work where the standby has been promoted and then used in a way that draws on the sequence (with the same workload being put through the now-promoted standby and the original-master), though, but I don't think that that was what you were asking about. How many similar issues have you seen? In the case where you have a promoted replica and put the same through workflow through both it and the master, I've seen "pg_dump -s" dump objects in different orders, for no apparent reason. That is kind of annoying, but I never traced it back to the cause (nor have I excluded PEBCAK as the real cause). Cheers, Jeff