On Mon, 29 Jan 2024, 22:52 Adrian Klaver, <adrian.kla...@aklaver.com> wrote:
> On 1/29/24 00:12, Laura Smith wrote: > > Hi > > > > Let's say I've got a scenario where I'm doing a pg_dump replication > rather than online streaming, e.g. due to air-gap or whatever. > > > > Is there a scriptable way to validate the restore ? e.g. using doing > something clever with ctid or something to ensure both the schema and all > its rows were restored to the same point at which the dump was taken ? > > Assuming you are using pg_restore on a non-plain text dump file then > from pg_restore.c: > > * pg_restore.c > > * pg_restore is an utility extracting postgres database definitions > * from a backup archive created by pg_dump using the archiver > * interface. > * > * pg_restore will read the backup archive and > * dump out a script that reproduces > * the schema of the database in terms of > * user-defined types > * user-defined functions > * tables > * indexes > * aggregates > * operators > * ACL - grant/revoke > * > * the output script is SQL that is understood by PostgreSQL > * > * Basic process in a restore operation is: > * > * Open the Archive and read the TOC. > * Set flags in TOC entries, and *maybe* reorder them. > * Generate script to stdout > * Exit > > Then: > > pg_restore -l -f <output_file> <dump_file> > > to get the TOC mentioned above. Walk through that to verify schema is > the same in the restored database. > > This will not tell you whether all the data was transferred. You will > either have to trust from pg_dump.c: > > * pg_dump will read the system catalogs in a database and dump out a > * script that reproduces the schema in terms of SQL that is > understood > * by PostgreSQL > * > * Note that pg_dump runs in a transaction-snapshot mode transaction, > * so it sees a consistent snapshot of the database including system > * catalogs. However, it relies in part on various specialized > backend > * functions like pg_get_indexdef(), and those things tend to look at > * the currently committed state. So it is possible to get 'cache > * lookup failed' error if someone performs DDL changes while a > dump is > * happening. The window for this sort of thing is from the > acquisition > * of the transaction snapshot to getSchemaData() (when pg_dump > acquires > * AccessShareLock on every table it intends to dump). It isn't > very large, > * but it can happen. > > Or come up with way to capture the state of the data at the time of dump > and then compare to restored database. Something like Ron posted. > Right, for me, state, not just record count is what I'm interested in (for the initial full table copy part of replication). So, given the explanation about the possible per-table window, is there some property of the table that could be used to confirm that a table has made it across? I guess there is such a thing since the following incremental syncing would presumably need it. I had hoped the LSN was this thing, but confirmation would be great. Thanks, Shaheed > > > > Thanks ! > > > > Laura > > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com > > > >