On Wed, Sep 27, 2017 at 2:55 PM, Jerry Sievers <gsiever...@comcast.net>
wrote:

> John Britto <j...@sodexis.com> writes:
>
> > Hello,
> >
> > I have a streaming replication setup along with WAL archive.
> >
> > archive_command = ‘test ! -f /var/pg_archive/%f && cp %p <archive
> > location>%f && scp %p postgres@192.168.0.123:<archive location>/%f'
> >
> > When the SCP command fails, the master repeatedly tries to send the
> > archived WAL to standby. But during this time, the pg_xlog directly
> > grew with newer WAL files.
> >
> > The streaming replication hasn't had the problem because on my check,
> > the WAL write location on the primary was same with the last WAL
> > location received/replayed in standby.
> >
> > Since the pg_xlog in the master had few newer WAL files, the master
> > archive is lagging to pick the current pg_xlog WAL file.  When a new
> > WAL occur in the pg_xlog, Master picks the old WAL file to send to
> > the standby.
>
> Yup Pg is going to handle the unshipped WALs one at a time and it will
> do them in order, oldest (lowest file name) first.
>
> > How should I force the PostgreSQL to batch copy the lagging WAL files
> > to pg_archive and then send to standby.  Can I do this manually using
> > rsync?  I wonder how PostgreSQL knows the changes because it
> > maintains info in archive_status with extension as .ready and .done.
>
> I suggest you fix your basic archiving routine to complete and exit with
> success to postgres.
>

+1

scp %p host:/archive/%f
if [ "$?" -ne 0 ]
then
    echo "`date`:FAILED:%p" >> /var/log/failed_xlog.log
    cp %p /localarchive/%f
    exit 0
fi

  Essentially, always make sure that you are returning a 0 to postgres.  If
there is a failure, either log it or handle it separately.  This code
snippet is *NOT COMPLETE, *there's a lot more to do in order to make it
production ready and recoverable.  The biggest issue I've had with scp is
that you have to set and enforce a timeout and trap the timeout.  Note, the
above only works until your local machine (or the /localarchive partition)
runs out of space.  It's *really* important that you have ultra solid
logging and alerting around this.




> And as for archive command scripts in general, simpler is better.
>
> If you want to manually ship them in bulk, you may do so but then will
> need to remove the corresponding archive_status/$foo.ready file so that
> postgres won't keep trying to ship the same one.
>

I'm a huge fan of this strategy, especially if you're sending to a remote
datacenter.

archive_command.sh:

cp %p /localarchive/%f
if [ "$?" -ne 0 ]
then
    echo "`date`:FAILED:%p" >> /var/log/failed_xlog.log
    exit 0
fi

send_archive_to_remote.sh
rsync -avzP /localarchive/* host:/archive/


Of course, now you have to deal with files that are removed from the slave
and making sure they get removed from the master appropriately, but, this
is fairly straightforward.

--Scott




> HTH
>
> > Please assist.
> >
> > Thanks,
> >
> > John Britto
> >
> >
> >
>
> --
> Jerry Sievers
> Postgres DBA/Development Consulting
> e: postgres.consult...@comcast.net
> p: 312.241.7800
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



-- 
--
Scott Mead
Sr. Architect
*OpenSCG <http://openscg.com>*
http://openscg.com

Reply via email to