Hello,

I am playing with a script that implements physical backups by snapshotting the 
EBS-backed software RAID. My basic workflow is this:

1. Stop PG on the slave
2. pg_start_backup on the master
3. On the slave:
   A. unmount the PG RAID
   B. snapshot each disk in the raid
   C. mount the PG RAID 
4. pg_stop_backup
5. Restart PG on the slave

Step 3 is actually quite fast, however, on the master, I end up seeing the 
following warning:

WARNING:  transaction log file "00000001000000CC00000076" could not be 
archived: too many failures

I am guessing (I will confirm with timestamps later) this warning happens 
during steps 3A-3C, however my questions below stand regardless of when this 
failure occurs.

It is worth noting that, the slave (seemingly) catches up eventually, 
recovering later log files with streaming replication current. Can I trust this 
state?

Should I be concerned about this warning? Is it a simple blip that can easily 
be ignored, or have I lost data? From googling, it looks like retry attempts is 
not a configurable parameter (it appears to have retried a handful of times).

If this is indeed a real problem, am I best off changing my archive_command to 
retain logs in a transient location when I am in "snapshot mode", and then ship 
them in bulk once the snapshot has completed? Are there any other remedies that 
I am missing?

Thank you very much for your time,

Andrew Hannon  
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to