On Wed, Sep 27, 2017 at 2:55 PM, Jerry Sievers <gsiever...@comcast.net> wrote:
> John Britto <j...@sodexis.com> writes: > > > Hello, > > > > I have a streaming replication setup along with WAL archive. > > > > archive_command = ‘test ! -f /var/pg_archive/%f && cp %p <archive > > location>%f && scp %p postgres@192.168.0.123:<archive location>/%f' > > > > When the SCP command fails, the master repeatedly tries to send the > > archived WAL to standby. But during this time, the pg_xlog directly > > grew with newer WAL files. > > > > The streaming replication hasn't had the problem because on my check, > > the WAL write location on the primary was same with the last WAL > > location received/replayed in standby. > > > > Since the pg_xlog in the master had few newer WAL files, the master > > archive is lagging to pick the current pg_xlog WAL file. When a new > > WAL occur in the pg_xlog, Master picks the old WAL file to send to > > the standby. > > Yup Pg is going to handle the unshipped WALs one at a time and it will > do them in order, oldest (lowest file name) first. > > > How should I force the PostgreSQL to batch copy the lagging WAL files > > to pg_archive and then send to standby. Can I do this manually using > > rsync? I wonder how PostgreSQL knows the changes because it > > maintains info in archive_status with extension as .ready and .done. > > I suggest you fix your basic archiving routine to complete and exit with > success to postgres. > +1 scp %p host:/archive/%f if [ "$?" -ne 0 ] then echo "`date`:FAILED:%p" >> /var/log/failed_xlog.log cp %p /localarchive/%f exit 0 fi Essentially, always make sure that you are returning a 0 to postgres. If there is a failure, either log it or handle it separately. This code snippet is *NOT COMPLETE, *there's a lot more to do in order to make it production ready and recoverable. The biggest issue I've had with scp is that you have to set and enforce a timeout and trap the timeout. Note, the above only works until your local machine (or the /localarchive partition) runs out of space. It's *really* important that you have ultra solid logging and alerting around this. > And as for archive command scripts in general, simpler is better. > > If you want to manually ship them in bulk, you may do so but then will > need to remove the corresponding archive_status/$foo.ready file so that > postgres won't keep trying to ship the same one. > I'm a huge fan of this strategy, especially if you're sending to a remote datacenter. archive_command.sh: cp %p /localarchive/%f if [ "$?" -ne 0 ] then echo "`date`:FAILED:%p" >> /var/log/failed_xlog.log exit 0 fi send_archive_to_remote.sh rsync -avzP /localarchive/* host:/archive/ Of course, now you have to deal with files that are removed from the slave and making sure they get removed from the master appropriately, but, this is fairly straightforward. --Scott > HTH > > > Please assist. > > > > Thanks, > > > > John Britto > > > > > > > > -- > Jerry Sievers > Postgres DBA/Development Consulting > e: postgres.consult...@comcast.net > p: 312.241.7800 > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- -- Scott Mead Sr. Architect *OpenSCG <http://openscg.com>* http://openscg.com