On Mon, 29 Jan 2024, 22:52 Adrian Klaver, <adrian.kla...@aklaver.com> wrote:

> On 1/29/24 00:12, Laura Smith wrote:
> > Hi
> >
> > Let's say I've got a scenario where I'm doing a pg_dump replication
> rather than online streaming, e.g. due to air-gap or whatever.
> >
> > Is there a scriptable way to validate the restore ?  e.g. using doing
> something clever with ctid or something to ensure both the schema and all
> its rows were restored to the same point at which the dump was taken ?
>
> Assuming you are using pg_restore on a non-plain text dump file then
> from pg_restore.c:
>
> * pg_restore.c
>
> *      pg_restore is an utility extracting postgres database definitions
>   *      from a backup archive created by pg_dump using the archiver
>   *      interface.
>   *
>   *      pg_restore will read the backup archive and
>   *      dump out a script that reproduces
>   *      the schema of the database in terms of
>   *                user-defined types
>   *                user-defined functions
>   *                tables
>   *                indexes
>   *                aggregates
>   *                operators
>   *                ACL - grant/revoke
>   *
>   * the output script is SQL that is understood by PostgreSQL
>   *
>   * Basic process in a restore operation is:
>   *
>   *      Open the Archive and read the TOC.
>   *      Set flags in TOC entries, and *maybe* reorder them.
>   *      Generate script to stdout
>   *      Exit
>
> Then:
>
>         pg_restore -l -f <output_file> <dump_file>
>
> to get the TOC mentioned above. Walk through that to verify schema is
> the same in the restored database.
>
> This will not tell you whether all the data was transferred. You will
> either have to trust from pg_dump.c:
>
>   *      pg_dump will read the system catalogs in a database and dump out a
>   *      script that reproduces the schema in terms of SQL that is
> understood
>   *      by PostgreSQL
>   *
>   *      Note that pg_dump runs in a transaction-snapshot mode transaction,
>   *      so it sees a consistent snapshot of the database including system
>   *      catalogs. However, it relies in part on various specialized
> backend
>   *      functions like pg_get_indexdef(), and those things tend to look at
>   *      the currently committed state.  So it is possible to get 'cache
>   *      lookup failed' error if someone performs DDL changes while a
> dump is
>   *      happening. The window for this sort of thing is from the
> acquisition
>   *      of the transaction snapshot to getSchemaData() (when pg_dump
> acquires
>   *      AccessShareLock on every table it intends to dump). It isn't
> very large,
>   *      but it can happen.
>
> Or come up with way to capture the state of the data at the time of dump
> and then compare to restored database. Something like Ron posted.
>

Right, for me, state, not just record count is what I'm interested in (for
the initial full table copy part of replication). So, given the explanation
about the possible per-table window, is there some property of the table
that could be used to confirm that a table has made it across?

I guess there is such a thing since the following incremental syncing would
presumably need it. I had hoped the LSN was this thing, but confirmation
would be great.

Thanks, Shaheed


> >
> > Thanks !
> >
> > Laura
> >
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>
>

Reply via email to