Hello list!

I'm using 9.1.5 on Ubuntu 11.10, in a streaming replication scenario. On my 
slave, recovery.conf states:

standby_mode = on
restore_command = '/usr/local/omnipitr/bin/omnipitr-restore -D 
/var/lib/postgresql/9.1/main/ --source gzip=/data/dbanalytics-wal/ 
--remove-unneeded --temp-dir /var/tmp/omnipitr -l 
/var/log/omnipitr/restore-^Y-^m-^d.log --streaming-replication --verbose 
--error-pgcontroldata hang "%f" "%p"'
archive_cleanup_command = '/usr/local/omnipitr/bin/omnipitr-cleanup --verbose 
--log /var/log/omnipitr/cleanup-^Y-^m-^d.log --archive 
gzip=/data/dbanalytics-wal/ "%r"'
primary_conninfo = 'host=master port=5432 user=dbrepl password=password'


I ran out of disk space on the slave, because the archived WAL records were not 
removed. The documentation for archive_cleanup_command states[1]:

"This optional parameter specifies a shell command that will be executed at 
every restartpoint."

The slave's last reference to a restart point was:

2012-11-28 20:18:08.279 UTC - @ 20910 (00000) 2012-11-27 21:10:01 UTC - LOG:  
restartpoint complete: wrote 58947 buffers (22.5%); 0 transaction log file(s) 
added, 0 removed, 103 recycled; write=539.537 s, sync=0.088 s, total=540.221 s; 
sync files=122, longest=0.087 s, ave
rage=0.000 s
2012-11-28 20:18:08.279 UTC - @ 20910 (00000) 2012-11-27 21:10:01 UTC - LOG:  
recovery restart point at E8F/F71DA1E0
2012-11-28 20:18:08.279 UTC - @ 20910 (00000) 2012-11-27 21:10:01 UTC - DETAIL: 
 last completed transaction was at log time 2012-11-28 17:15:51.275427+00
2012-11-28 21:19:24.245 UTC - svanalytics@svanalytics_staging 3476 (00000) 
2012-11-28 19:55:50 UTC - LOG:  duration: 4984689.535 ms  statement: SELECT ...

OmniPITR's restore log ends with:

2012-11-28 13:54:13.396142 +0000 : 26378 : omnipitr-restore : LOG : Timer 
[Copying segment 0000000200000E8E00000069 to 
/var/lib/postgresql/9.1/main/pg_xlog/RECOVERYXLOG] took: 0.238s
2012-11-28 13:54:13.485916 +0000 : 26378 : omnipitr-restore : LOG : Segment 
0000000200000E8E00000069 restored
2012-11-28 13:54:13.787225 +0000 : 26384 : omnipitr-restore : LOG : Called with 
parameters: -D /var/lib/postgresql/9.1/main/ --source 
gzip=/data/dbanalytics-wal/ --remove-unneeded --temp-dir /var/tmp/omnipitr -l 
/var/log/omnipitr/restore-^Y-^m-^d.log --streaming-replication --verbose 
--error-pgcontroldata hang 0000000200000E8E0000006A pg_xlog/RECOVERYXLOG
2012-11-28 13:54:13.802772 +0000 : 26384 : omnipitr-restore : FATAL : Requested 
file does not exist, and it is streaming replication environment. Dying.


And OmniPITR's cleanup log ends with:

2012-11-28 20:18:11.237740 +0000 : 10384 : omnipitr-cleanup : LOG : Segment 
0000000200000E8F000000F2.gz removed.
2012-11-28 20:18:11.256186 +0000 : 10384 : omnipitr-cleanup : LOG : Segment 
0000000200000E8F000000F3.gz removed.
2012-11-28 20:18:11.258942 +0000 : 10384 : omnipitr-cleanup : LOG : Segment 
0000000200000E8F000000F4.gz removed.
2012-11-28 20:18:11.261542 +0000 : 10384 : omnipitr-cleanup : LOG : Segment 
0000000200000E8F000000F5.gz removed.
2012-11-28 20:18:11.263758 +0000 : 10384 : omnipitr-cleanup : LOG : Segment 
0000000200000E8F000000F6.gz removed.
2012-11-28 20:18:11.265259 +0000 : 10384 : omnipitr-cleanup : LOG : 34 segments 
removed.

It's 2012-12-30 14:35 UTC on the machine.

How come no new restart points were achieved? I had 4008 WAL archives on my 
slave. I expected them to be removed as streaming replication progressed. Are 
restart points prevented while long queries are running?

Thanks!
François Beausoleil

  [1]: 
http://www.postgresql.org/docs/current/static/archive-recovery-settings.html#ARCHIVE-CLEANUP-COMMAND

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to