Hi,

Thanks for your response. I have just replayed switching master and slave
once again:

- one master and one slave (total size of each server is more than 4GB).
Currently the last log of the slave is "started  streaming WAL from primary
at 2/D6000000 on timeline 10".

- stop master, the slave show below logs:
          replication terminated by primary server
          End of WAL reached on timeline 10 at 2/D69304D0
          Invalid record length at 2/D69304D0
          could not connect to primary server

- promote the slave:
          receive promote request
          redo done at 2/D6930460
          selected new timeline ID: 11
          archive recovery complete
          MultiXact member wraparound protections are now enabled
          database system is ready to accept connections
          autovacuum launcher started

- start and stop old master, then run pg_rewind (all are executed
immediately after promoting the slave). Logs of pg_rewind:
          servers diverged at WAL position 2/D69304D0 on timeline 10
          rewinding from last common checkpoint at 2/D6930460 on timeline 10
          reading source file list
          reading target file list
          reading WAL in target
          need to copy 4168 MB (total source directory is 4186 MB)
          4268372/4268372 kB (100%) copied
          creating backup label and updating control file
          syncing target data directory
          Done!

If I run pg_rewind with debug option, it just show additional bunch of
files copied in directories like base or pg_tblspc. I claim that there is
no data inserted of modified from the first step. The only difference
between two server is caused by restarting old master.

Thanks and Regards,

Hung Phan



On Wed, Sep 13, 2017 at 10:48 AM, Michael Paquier <michael.paqu...@gmail.com
> wrote:

> On Wed, Sep 13, 2017 at 12:41 PM, Hung Phan <hungphan...@gmail.com> wrote:
> > I have tested pg_rewind (ver 9.5) with the following scenario:
> >
> > - one master and one slave (total size of each server is more than 4GB)
> > - set wal_log_hint=on and restart both
> > - stop master, promote slave
> > - start old master again (now two servers have diverged)
> > - stop old master, run pg_rewind with progress option
>
> That's a good flow. Don't forget to run a manual checkpoint after
> promotion to update the control file of the promoted standby so as
> pg_rewind is able to identify the timeline difference between the
> source and the target servers.
>
> > The pg_rewind ran successfully but I saw it copied more than 4GB
> (4265891 kB
> > copied). So I wonder there was very minor difference between two servers
> but
> > why did pg_rewind copy almost all data of new master?
>
> Without knowing exactly the list of things that have been registered
> as things to copy from the active source to the target, it is hard to
> give a conclusion. But my bet here is that you let the target server
> online long enough that it had a bunch of block updated, causing more
> relation blocks to be copied from the source because more efforts
> would be needed to re-sync it. That's only an assumption without data
> with clear numbers, numbers that could be found using the --debug
> messages of pg_rewind.
> --
> Michael
>

Reply via email to