Re: [BUGS] BUG #7494: WAL replay speed depends heavily on the shared_buffers size

Valentine Gogichashvili Thu, 16 Aug 2012 07:53:49 -0700

Hello Andreas,

here is the process, that now actually is not using CPU at all and the
shared_buffers are set to 2GB:


50978 postgres  20   0 2288m 2.0g 2.0g S  0.0  1.6   4225:34 postgres:
startup process   recovering 000000050000262E000000FD

It is hanging on that file for several minutes now.

and here is the strace:

$ strace -c -f -p 50978
Process 50978 attached - interrupt to quit
 Process 50978 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 94.82    0.007999          37       215           select
  2.73    0.000230           1       215           getppid
  2.45    0.000207           1       215       215 stat
------ ----------- ----------- --------- --------- ----------------
100.00    0.008436                   645       215 total

What kind of additional profiling information would you like to see?

Regards,

-- Valentin


On Wed, Aug 15, 2012 at 4:09 PM, Andres Freund <and...@2ndquadrant.com>wrote:

> Hi,
>
> On Wednesday, August 15, 2012 12:10:42 PM val...@gmail.com wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference:      7494
> > Logged by:          Valentine Gogichashvili
> > Email address:      val...@gmail.com
> > PostgreSQL version: 9.0.7
> > Operating system:   Linux version 2.6.32-5-amd64 (Debian 2.6.32-41)
> > Description:
> >
> > We are experiencing strange(?) behavior on the replication slave
> machines.
> > The master machine has a very heavy update load, where many processes are
> > updating lots of data. It generates up to 30GB of WAL files per hour.
> > Normally it is not a problem for the slave machines to replay this amount
> > of WAL files on time and keep on with the master. But at some moments,
> the
> > slaves are “hanging” with 100% CPU usage on the WAL replay process and 3%
> > IOWait, needing up to 30 seconds to process one WAL file. If this tipping
> > point is reached, then a huge WAL replication lag is building up quite
> > fast, that also leads to overfill of the XLOG directory on the slave
> > machines, as the WAL receiver is putting the WAL files it gets via
> > streaming replication the XLOG directory (that, in many cases are quite a
> > limited size separate disk partition).
> Could you try to get a profile of that 100% cpu time?
>
> Greetings,
>
> Andres
> --
> Andres Freund           http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [BUGS] BUG #7494: WAL replay speed depends heavily on the shared_buffers size

Reply via email to