[PERFORM] Sudden crazy high CPU usage

2014-03-31 Thread Niels Kristian Schjødt
I'm running postgresql 9.3 on a production server. An hour ago, out of the 
"blue", I ran into an issue I have never encountered before: my server started 
to use CPU as crazy. The server is a standard ubuntu 12.04 LTE installation 
running only Postgres and Redis.

The incident can be seen on the in numbers below: 

https://s3-eu-west-1.amazonaws.com/autouncle-public/other/cpu.png

I imidiatly took a look at pg_stat_activity but nothing in there seemed 
suspicious. I also had a look at the postgres log, but nothing was in there 
too. I have pg_stat_statements running, so I reseted that one, and nothing 
really suspicious occurred in there, expect for the fact, that all queries were 
taking 100x times longer than usual.

I have tried the following with no luck:

• Restart clients connecting to the db
• Restart postgres
• Restart the whole server

I have run memory tests on the server as well, and nothing seems to be wrong.

No changes in any software running on the servers has been made within the last 
24 hours.

The question is: I have a streaming replication server running, which I have 
now done a failover to, and it runs fine. However I still have no clue why my 
master suddenly has become so CPU consuming, and how I can debug / trace it 
further down?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Count-Distinct Query

2014-03-31 Thread Shaun Thomas
>  tl;dr - How can I speed up my count-distinct query?  

You can't.

Doing a count(distinct x) is much different than a count(1), which can simply 
scan available indexes. To build a distinct, it has to construct an in-memory 
hash of every valid email, and count the distinct values therein. This will 
pretty much never be fast, especially with 2M rows involved.

I could be wrong about this, and the back-end folks might have a different 
answer, but I wouldn't hold my breath.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd | Suite 400 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden crazy high CPU usage

2014-03-31 Thread Merlin Moncure
On Mon, Mar 31, 2014 at 5:25 AM, Niels Kristian Schjødt
 wrote:
> I'm running postgresql 9.3 on a production server. An hour ago, out of the 
> "blue", I ran into an issue I have never encountered before: my server 
> started to use CPU as crazy. The server is a standard ubuntu 12.04 LTE 
> installation running only Postgres and Redis.
>
> The incident can be seen on the in numbers below:
>
> https://s3-eu-west-1.amazonaws.com/autouncle-public/other/cpu.png
>
> I imidiatly took a look at pg_stat_activity but nothing in there seemed 
> suspicious. I also had a look at the postgres log, but nothing was in there 
> too. I have pg_stat_statements running, so I reseted that one, and nothing 
> really suspicious occurred in there, expect for the fact, that all queries 
> were taking 100x times longer than usual.
>
> I have tried the following with no luck:
>
> * Restart clients connecting to the db
> * Restart postgres
> * Restart the whole server
>
> I have run memory tests on the server as well, and nothing seems to be wrong.
>
> No changes in any software running on the servers has been made within the 
> last 24 hours.
>
> The question is: I have a streaming replication server running, which I have 
> now done a failover to, and it runs fine. However I still have no clue why my 
> master suddenly has become so CPU consuming, and how I can debug / trace it 
> further down?

Using linux 6? One possible culprit is "Transparent Huge Page
Compaction".  It tends to hit severs with a lot of memory, especially
if they've configured a lot of shared buffers.  Google it a for a lot
of info.

There may be other issues masquerading as this one but it's the first
thing to rule out.  Symptoms are very high cpu utilization and poor
performance that strikes without warning and then resolves also
without warning (typically seconds or minutes after the event).

For starters, take a look at the value of:

/sys/kernel/mm/redhat_transparent_hugepage/enabled

And do some due diligence research.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Count-Distinct Query

2014-03-31 Thread Tom Lane
Christopher Jackson  writes:
>   tl;dr - How can I speed up my count-distinct query?

EXPLAIN doesn't provide a lot of visibility into what the Aggregate plan
node is doing, but in this case what it's doing is an internal sort/uniq
operation to implement the DISTINCT.  You didn't say what value of
work_mem you're using, but it'd need to be probably 50-100MB to prevent
that sort from spilling to disk (and therefore being slow).

Note that the indexscan is actually *slower* than the seqscan so far as
the table access is concerned; if the table were big enough to not fit
in RAM, this would get very much worse.  So I'm not impressed with trying
to force the optimizer's hand as you've done here --- it might be a nice
plan now, but it's brittle.  See if a bigger work_mem improves matters
enough with the regular plan.

> *I'm concerned about setting the enable_bitmapscan and seq_page_cost values
> because I'm not yet sure what the consequences are.  Can anyone enlighten
> me on the recommended way to speed up this query?*

Turning off enable_bitmapscan globally would be a seriously bad idea.
Changing the cost settings to these values globally might be all right;
it would amount to optimizing for all-in-memory cases, which might or
might not be a good idea for your situation.  For that matter, greatly
increasing work_mem globally is usually not thought to be smart either;
remember that it's a per-sort-operation setting so you may need to
provision a considerable multiple of the setting as physical RAM,
depending on how many queries you expect to run concurrently.  So all in
all you might be well advised to just set special values for this one
query, whichever solution approach you use.

I doubt you need the "where email=email" hack, in any case.  That isn't
forcing the optimizer's decision in any meaningful fashion.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden crazy high CPU usage

2014-03-31 Thread Niels Kristian Schjødt
Thanks, this seems to persist after a reboot of the server though, and I have never in my server’s 3 months life time experienced anything like it.
Niels Kristian SchjødtCo-founder & DeveloperE-Mail: nielskrist...@autouncle.comMobile: 0045 28 73 04 93www.autouncle.comFollow us: Facebook  |  Google+  |  LinkedIn  |  Twitter  Get app for: iPhone & iPad  |  Android

Den 31/03/2014 kl. 15.47 skrev Merlin Moncure :On Mon, Mar 31, 2014 at 5:25 AM, Niels Kristian Schjødt wrote:I'm running postgresql 9.3 on a production server. An hour ago, out of the "blue", I ran into an issue I have never encountered before: my server started to use CPU as crazy. The server is a standard ubuntu 12.04 LTE installation running only Postgres and Redis.The incident can be seen on the in numbers below:https://s3-eu-west-1.amazonaws.com/autouncle-public/other/cpu.pngI imidiatly took a look at pg_stat_activity but nothing in there seemed suspicious. I also had a look at the postgres log, but nothing was in there too. I have pg_stat_statements running, so I reseted that one, and nothing really suspicious occurred in there, expect for the fact, that all queries were taking 100x times longer than usual.I have tried the following with no luck:    * Restart clients connecting to the db    * Restart postgres    * Restart the whole serverI have run memory tests on the server as well, and nothing seems to be wrong.No changes in any software running on the servers has been made within the last 24 hours.The question is: I have a streaming replication server running, which I have now done a failover to, and it runs fine. However I still have no clue why my master suddenly has become so CPU consuming, and how I can debug / trace it further down?Using linux 6? One possible culprit is "Transparent Huge PageCompaction".  It tends to hit severs with a lot of memory, especiallyif they've configured a lot of shared buffers.  Google it a for a lotof info.There may be other issues masquerading as this one but it's the firstthing to rule out.  Symptoms are very high cpu utilization and poorperformance that strikes without warning and then resolves alsowithout warning (typically seconds or minutes after the event).For starters, take a look at the value of:/sys/kernel/mm/redhat_transparent_hugepage/enabledAnd do some due diligence research.merlin

Re: [PERFORM] Sudden crazy high CPU usage

2014-03-31 Thread Merlin Moncure
On Mon, Mar 31, 2014 at 9:24 AM, Niels Kristian Schjødt <
nielskrist...@autouncle.com> wrote:

> Thanks, this seems to persist after a reboot of the server though, and I
> have never in my server's 3 months life time experienced anything like it.
>


huh.  Any chance of getting 'perf' installed and running a perf top?

merlin


Re: [PERFORM] Sudden crazy high CPU usage

2014-03-31 Thread Scott Marlowe
On Mon, Mar 31, 2014 at 8:24 AM, Niels Kristian Schjødt
 wrote:
>
> Thanks, this seems to persist after a reboot of the server though, and I have 
> never in my server's 3 months life time experienced anything like it.

Could it be overheating and therefore throttling the cores?

Also another thing to look at on large memory machines with > 1 CPU
socket is zone_reclaim_mode being set to 1. Always set it to 0 on a
linux machine running postgres.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Count-Distinct Query

2014-03-31 Thread Christopher Jackson
   Tom and Shawn,

   Thanks for the feedback.  This has been helpful.  It's worth noting that
I was spiking this out on my local box using default memory utilization
settings.  I'll revisit this once we get our production box set up.  It's
good to know what the best practices are around the enable_bitmapscan and
seq_page_cost settings are.  Also, it's good to know that my hack wasn't
actually yielding anything.  I'll check back in once our production
environment is up and running.  For what it's worth, we're using Heroku and
we're thinking of going with the Standard Tengu tier as a start.  This will
give us 1.7GB of RAM, so hopefully bumping up the work_mem setting
shouldn't be a problem.  Does that make sense?

  Thanks for the help,
 Chris


On Mon, Mar 31, 2014 at 9:15 AM, Tom Lane  wrote:

> Christopher Jackson  writes:
> >   tl;dr - How can I speed up my count-distinct query?
>
> EXPLAIN doesn't provide a lot of visibility into what the Aggregate plan
> node is doing, but in this case what it's doing is an internal sort/uniq
> operation to implement the DISTINCT.  You didn't say what value of
> work_mem you're using, but it'd need to be probably 50-100MB to prevent
> that sort from spilling to disk (and therefore being slow).
>
> Note that the indexscan is actually *slower* than the seqscan so far as
> the table access is concerned; if the table were big enough to not fit
> in RAM, this would get very much worse.  So I'm not impressed with trying
> to force the optimizer's hand as you've done here --- it might be a nice
> plan now, but it's brittle.  See if a bigger work_mem improves matters
> enough with the regular plan.
>
> > *I'm concerned about setting the enable_bitmapscan and seq_page_cost
> values
> > because I'm not yet sure what the consequences are.  Can anyone enlighten
> > me on the recommended way to speed up this query?*
>
> Turning off enable_bitmapscan globally would be a seriously bad idea.
> Changing the cost settings to these values globally might be all right;
> it would amount to optimizing for all-in-memory cases, which might or
> might not be a good idea for your situation.  For that matter, greatly
> increasing work_mem globally is usually not thought to be smart either;
> remember that it's a per-sort-operation setting so you may need to
> provision a considerable multiple of the setting as physical RAM,
> depending on how many queries you expect to run concurrently.  So all in
> all you might be well advised to just set special values for this one
> query, whichever solution approach you use.
>
> I doubt you need the "where email=email" hack, in any case.  That isn't
> forcing the optimizer's decision in any meaningful fashion.
>
> regards, tom lane
>


Re: [PERFORM] Sudden crazy high CPU usage

2014-03-31 Thread Niels Kristian Schjødt
Yes, I could install “perf”, though I’m not familiar with it. What would i do? :-)
Niels Kristian SchjødtCo-founder & DeveloperE-Mail: nielskrist...@autouncle.comMobile: 0045 28 73 04 93www.autouncle.comFollow us: Facebook  |  Google+  |  LinkedIn  |  Twitter  Get app for: iPhone & iPad  |  Android

Den 31/03/2014 kl. 16.36 skrev Merlin Moncure :perf

Re: [PERFORM] Sudden crazy high CPU usage

2014-03-31 Thread Niels Kristian Schjødt
Thanks, I don’t think overheating is an issue, it’s a large dell server, and I 
have checked the historic CPU temperature in the servers control panel, and no 
overheating has shown.

Zone_reclaim_mode is already set to 0 

Den 31/03/2014 kl. 16.50 skrev Scott Marlowe :

> On Mon, Mar 31, 2014 at 8:24 AM, Niels Kristian Schjødt
>  wrote:
>> 
>> Thanks, this seems to persist after a reboot of the server though, and I 
>> have never in my server's 3 months life time experienced anything like it.
> 
> Could it be overheating and therefore throttling the cores?
> 
> Also another thing to look at on large memory machines with > 1 CPU
> socket is zone_reclaim_mode being set to 1. Always set it to 0 on a
> linux machine running postgres.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden crazy high CPU usage

2014-03-31 Thread Sergey Konoplev
On Mon, Mar 31, 2014 at 3:25 AM, Niels Kristian Schjødt
 wrote:
> I'm running postgresql 9.3 on a production server. An hour ago, out of the 
> "blue", I ran into an issue I have never encountered before: my server 
> started to use CPU as crazy. The server is a standard ubuntu 12.04 LTE 
> installation running only Postgres and Redis.
>
> The incident can be seen on the in numbers below:
>
> https://s3-eu-west-1.amazonaws.com/autouncle-public/other/cpu.png

The increase doesn't look so sudden. My guess is that the server got
some new activity. The advice is to setup the statistics collecting
script by the link [1] and review the results for a period of hour or
so. It shows charts of statements by CPU/IO/calls with aggregated
stats, so you could probably find out more than with pure
pg_stat_statements.

[1] 
https://github.com/grayhemp/pgcookbook/blob/master/statement_statistics_collecting_and_reporting.md

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden crazy high CPU usage

2014-03-31 Thread Will Platnick
In New Relic, go back a half hour before the problem started so you can't
see that this spike happened and send the same screenshot in. My guess is
you have increased activity hitting the DB. Do you have pgbouncer or some
kind of connection pooling sitting in front? 198 open server connections
could account for an increase in load like you're seeing. Do you have
postgresql addon in New Relic to show you how many queries are hitting the
system to correlate data to?

On Mon, Mar 31, 2014 at 1:36 PM, Sergey Konoplev  wrote:

> On Mon, Mar 31, 2014 at 3:25 AM, Niels Kristian Schjødt
>  wrote:
> > I'm running postgresql 9.3 on a production server. An hour ago, out of
> the "blue", I ran into an issue I have never encountered before: my server
> started to use CPU as crazy. The server is a standard ubuntu 12.04 LTE
> installation running only Postgres and Redis.
> >
> > The incident can be seen on the in numbers below:
> >
> > https://s3-eu-west-1.amazonaws.com/autouncle-public/other/cpu.png
>
> The increase doesn't look so sudden. My guess is that the server got
> some new activity. The advice is to setup the statistics collecting
> script by the link [1] and review the results for a period of hour or
> so. It shows charts of statements by CPU/IO/calls with aggregated
> stats, so you could probably find out more than with pure
> pg_stat_statements.
>
> [1]
> https://github.com/grayhemp/pgcookbook/blob/master/statement_statistics_collecting_and_reporting.md
>
> --
> Kind regards,
> Sergey Konoplev
> PostgreSQL Consultant and DBA
>
> http://www.linkedin.com/in/grayhemp
> +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
> gray...@gmail.com
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>