Re: [GENERAL] Server hitting 100% CPU usage, system comes to a crawl.

Scott Mead Thu, 27 Oct 2011 14:34:49 -0700

On Thu, Oct 27, 2011 at 2:39 PM, Brian Fehrle <bri...@consistentstate.com>wrote:


> Hi all, need some help/clues on tracking down a performance issue.
>
> PostgreSQL version: 8.3.11
>
> I've got a system that has 32 cores and 128 gigs of ram. We have connection
> pooling set up, with about 100 - 200 persistent connections open to the
> database. Our applications then use these connections to query the database
> constantly, but when a connection isn't currently executing a query, it's
> <IDLE>. On average, at any given time, there are 3 - 6 connections that are
> actually executing a query, while the rest are <IDLE>.
>

Remember, when you read pg_stat_activity, it is showing you query activity
from that exact specific moment in time.  Just because it looks like only
3-6 connections are executing, doesn't mean that 200 aren't actually
executing < .1ms statements.  With such a beefy box, I would see if you can
examine any stats from your connection pooler to find out how many
connections are actually getting used.



>
> About once a day, queries that normally take just a few seconds slow way
> down, and start to pile up, to the point where instead of just having 3-6
> queries running at any given time, we get 100 - 200. The whole system comes
> to a crawl, and looking at top, the CPU usage is 99%.
>
> Looking at top, I see no SWAP usage, very little IOWait, and there are a
> large number of postmaster processes at 100% cpu usage (makes sense, at this
> point there are 150 or so queries currently executing on the database).
>
>  Tasks: 713 total,  44 running, 668 sleeping,   0 stopped,   1 zombie
> Cpu(s):  4.4%us, 92.0%sy,  0.0%ni,  3.0%id,  0.0%wa,  0.0%hi,  0.3%si,
>  0.2%st
> Mem:  134217728k total, 131229972k used,  2987756k free,   462444k buffers
> Swap:  8388600k total,      296k used,  8388304k free, 119029580k cached
>
>
> In the past, we noticed that autovacuum was hitting some large tables at
> the same time this happened, so we turned autovacuum off to see if that was
> the issue, and it still happened without any vacuums running.
>
That was my next question :)

>
> We also ruled out checkpoints being the cause.
>
> How exactly did you rule this out?  Just because a checkpoint is over
doesn't mean that it hasn't had a negative effect on the OS cache.  If
you're stuck going to disk, that could be hurting you (that being said, you
do point to a low I/O wait above, so you're probably correct in ruling this
out).


>
> I'm currently digging through some statistics I've been gathering to see if
> traffic increased at all, or remained the same when the slowdown occurred.
> I'm also digging through the logs from the postgresql cluster (I increased
> verbosity yesterday), looking for any clues. Any suggestions or clues on
> where to look for this to see what can be causing a slowdown like this would
> be greatly appreciated.
>
> Are you capturing table-level stats from pg_stat_user_[tables | indexes]?
Just because a server doesn't look busy doesn't mean that you're not doing
1000 index scans per second returning 1000 tuples each time.

--Scott


> Thanks,
>    - Brian F
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/**mailpref/pgsql-general<http://www.postgresql.org/mailpref/pgsql-general>
>

Re: [GENERAL] Server hitting 100% CPU usage, system comes to a crawl.

Reply via email to