On 1/29/24 00:12, Laura Smith wrote:
Hi

Let's say I've got a scenario where I'm doing a pg_dump replication rather than 
online streaming, e.g. due to air-gap or whatever.

Is there a scriptable way to validate the restore ?  e.g. using doing something 
clever with ctid or something to ensure both the schema and all its rows were 
restored to the same point at which the dump was taken ?

Assuming you are using pg_restore on a non-plain text dump file then
from pg_restore.c:

* pg_restore.c

*      pg_restore is an utility extracting postgres database definitions
 *      from a backup archive created by pg_dump using the archiver
 *      interface.
 *
 *      pg_restore will read the backup archive and
 *      dump out a script that reproduces
 *      the schema of the database in terms of
 *                user-defined types
 *                user-defined functions
 *                tables
 *                indexes
 *                aggregates
 *                operators
 *                ACL - grant/revoke
 *
 * the output script is SQL that is understood by PostgreSQL
 *
 * Basic process in a restore operation is:
 *
 *      Open the Archive and read the TOC.
 *      Set flags in TOC entries, and *maybe* reorder them.
 *      Generate script to stdout
 *      Exit

Then:

        pg_restore -l -f <output_file> <dump_file>

to get the TOC mentioned above. Walk through that to verify schema is the same in the restored database.

This will not tell you whether all the data was transferred. You will either have to trust from pg_dump.c:

 *      pg_dump will read the system catalogs in a database and dump out a
* script that reproduces the schema in terms of SQL that is understood
 *      by PostgreSQL
 *
 *      Note that pg_dump runs in a transaction-snapshot mode transaction,
 *      so it sees a consistent snapshot of the database including system
 *      catalogs. However, it relies in part on various specialized backend
 *      functions like pg_get_indexdef(), and those things tend to look at
 *      the currently committed state.  So it is possible to get 'cache
* lookup failed' error if someone performs DDL changes while a dump is * happening. The window for this sort of thing is from the acquisition * of the transaction snapshot to getSchemaData() (when pg_dump acquires * AccessShareLock on every table it intends to dump). It isn't very large,
 *      but it can happen.

Or come up with way to capture the state of the data at the time of dump and then compare to restored database. Something like Ron posted.


Thanks !

Laura



--
Adrian Klaver
adrian.kla...@aklaver.com



Reply via email to