On Mar 26, 2014, at 9:04 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:

> On Tue, Mar 25, 2014 at 6:33 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> On Tuesday, March 25, 2014, Steven Schlansker <ste...@likeness.com> wrote:
> Hi everyone,
> 
> I have a Postgres 9.3.3 database machine.  Due to some intelligent work on 
> the part of someone who shall remain nameless, the WAL archive command 
> included a ‘> /dev/null 2>&1’ which masked archive failures until the disk 
> entirely filled with 400GB of pg_xlog entries.
> 
> PostgreSQL itself should be logging failures to the server log, regardless of 
> whether those failures log themselves.
> 
> 
> I have fixed the archive command and can see WAL segments being shipped off 
> of the server, however the xlog remains at a stable size and is not 
> shrinking.  In fact, it’s still growing at a (much slower) rate.
> 
> The leading edge of the log files should be archived as soon as they fill up, 
> and recycled/deleted two checkpoints later.  The trailing edge should be 
> archived upon checkpoints and then recycled or deleted.  I think there is a 
> throttle on how many off the trailing edge are archived each checkpoint.  So 
> issues a bunch of  "CHECKPOINT;" commands for a while and see if that clears 
> it up.

Indeed, forcing a bunch of CHECKPOINTS started to get things moving again.

> 
> Actually my description is rather garbled, mixing up what I saw when 
> wal_keep_segments was lowered, not when recovering from a long lasting 
> archive failure.  Nevertheless, checkpoints are what provoke the removal of 
> excessive WAL files.  Are you logging checkpoints?  What do they say?  Also, 
> what is in pg_xlog/archive_status ?
>  

I do log checkpoints, but most of them recycle and don’t remove:
Mar 26 16:09:36 prd-db1a postgres[29161]: [221-1] db=,user= LOG:  checkpoint 
complete: wrote 177293 buffers (4.2%); 0 transaction log file(s) added, 0 
removed, 56 recycled; write=539.838 s, sync=0.049 s, total=539.909 s; sync 
files=342, longest=0.015 s, average=0.000 s

That said, after letting the db run / checkpoint / archive overnight, the xlog 
did indeed start to slowly shrink.  The pace at which it is shrinking is 
somewhat unsatisfying, but at least we are making progress now!

I guess if I had just been patient I could have saved some mailing list 
traffic.  But patience is hard when your production database system is running 
at 0% free disk :)

Thanks everyone for the help, if the log continues to shrink, I should be out 
of the woods now.

Best,
Steven



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to