Hello Andreas, here is the process, that now actually is not using CPU at all and the shared_buffers are set to 2GB:
50978 postgres 20 0 2288m 2.0g 2.0g S 0.0 1.6 4225:34 postgres: startup process recovering 000000050000262E000000FD It is hanging on that file for several minutes now. and here is the strace: $ strace -c -f -p 50978 Process 50978 attached - interrupt to quit Process 50978 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 94.82 0.007999 37 215 select 2.73 0.000230 1 215 getppid 2.45 0.000207 1 215 215 stat ------ ----------- ----------- --------- --------- ---------------- 100.00 0.008436 645 215 total What kind of additional profiling information would you like to see? Regards, -- Valentin On Wed, Aug 15, 2012 at 4:09 PM, Andres Freund <and...@2ndquadrant.com>wrote: > Hi, > > On Wednesday, August 15, 2012 12:10:42 PM val...@gmail.com wrote: > > The following bug has been logged on the website: > > > > Bug reference: 7494 > > Logged by: Valentine Gogichashvili > > Email address: val...@gmail.com > > PostgreSQL version: 9.0.7 > > Operating system: Linux version 2.6.32-5-amd64 (Debian 2.6.32-41) > > Description: > > > > We are experiencing strange(?) behavior on the replication slave > machines. > > The master machine has a very heavy update load, where many processes are > > updating lots of data. It generates up to 30GB of WAL files per hour. > > Normally it is not a problem for the slave machines to replay this amount > > of WAL files on time and keep on with the master. But at some moments, > the > > slaves are “hanging” with 100% CPU usage on the WAL replay process and 3% > > IOWait, needing up to 30 seconds to process one WAL file. If this tipping > > point is reached, then a huge WAL replication lag is building up quite > > fast, that also leads to overfill of the XLOG directory on the slave > > machines, as the WAL receiver is putting the WAL files it gets via > > streaming replication the XLOG directory (that, in many cases are quite a > > limited size separate disk partition). > Could you try to get a profile of that 100% cpu time? > > Greetings, > > Andres > -- > Andres Freund http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >