[PERFORM] cpu throttling

2007-08-02 Thread Bryan Murphy
We have a complicated stored procedure that we run frequently.  It
pegs one of our postmaster processes at 100% CPU utilization for a few
hours.  This has the unfortunate side effect of causing increased
latency for our other queries.  We are currently planning a fix, but
because of the complicated nature of this procedure it is going to
take some time to implement.

I've noticed that if I renice the process that is running the query,
the other postmaster processes are able to respond to our other
queries in a timely fashion.

My question:  Is there a way I can decrease the priority of a specific
query, or determine the PID of the process it is running in?  I'd like
to throw together a quick shell script if at all possible, as right
now I have to monitor the process manually and we'll have fixed the
problem long before we have the chance to implement proper database
clustering.

Bryan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] cpu throttling

2007-08-02 Thread Alan Hodgson
On Thursday 02 August 2007 09:02, "Bryan Murphy" <[EMAIL PROTECTED]> 
wrote:
> My question:  Is there a way I can decrease the priority of a specific
> query, or determine the PID of the process it is running in?  I'd like
> to throw together a quick shell script if at all possible, as right
> now I have to monitor the process manually and we'll have fixed the
> problem long before we have the chance to implement proper database
> clustering.

select procpid from pg_stat_activity  where current_query   
   like '%stored_proc%' and current_query not like '%pg_stat_activity%';

requires stats_command_string to be enabled

I'm surprised your operating system doesn't automatically lower the priority 
of the process, though ..

-- 
"Remember when computers were frustrating because they did exactly what
you told them to?  That actually seems sort of quaint now." --J.D. Baldwin


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] cpu throttling

2007-08-02 Thread Bryan Murphy
It's a 4 processor Intel xeon machine with more than enough ram.  The
entire database can fit in memory, and while the CPU is pegged,
nothing is chewing up I/O bandwidth, and nothing is getting swapped
out of RAM.

I'm running Debian stable with only a few tweaks to the kernel's
memory settings.  As far as I'm aware, I have not changed anything
that would impact scheduling.

Other queries do respond, but it's more like every couple of seconds
one query which normally takes 300ms might take 8000ms.  Nothing
terrible, but enough that our users will notice.

Bryam

On 8/2/07, Alan Hodgson <[EMAIL PROTECTED]> wrote:
> On Thursday 02 August 2007 09:02, "Bryan Murphy" <[EMAIL PROTECTED]>
> wrote:
> > My question:  Is there a way I can decrease the priority of a specific
> > query, or determine the PID of the process it is running in?  I'd like
> > to throw together a quick shell script if at all possible, as right
> > now I have to monitor the process manually and we'll have fixed the
> > problem long before we have the chance to implement proper database
> > clustering.
>
> select procpid from pg_stat_activity  where current_query
>like '%stored_proc%' and current_query not like '%pg_stat_activity%';
>
> requires stats_command_string to be enabled
>
> I'm surprised your operating system doesn't automatically lower the priority
> of the process, though ..
>
> --
> "Remember when computers were frustrating because they did exactly what
> you told them to?  That actually seems sort of quaint now." --J.D. Baldwin
>
>
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend
>

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Why are distinct and group by choosing different plans?

2007-08-02 Thread Ron Mayer
I notice that I get different plans when I run the
following two queries that I thought would be
identical.

  select distinct test_col from mytable;
  select test_col from mytable group by test_col;

Any reason why it favors one in one case but not the other?



d=# explain analyze select distinct test_col from mytable;
 QUERY 
PLAN

 Unique  (cost=0.00..14927.69 rows=27731 width=4) (actual time=0.144..915.214 
rows=208701 loops=1)
   ->  Index Scan using "mytable(test_col)" on mytable  (cost=0.00..14160.38 
rows=306925 width=4) (actual time=0.140..575.580 rows=306925 loops=1)
 Total runtime: 1013.657 ms
(3 rows)

d=# explain analyze select test_col from mytable group by test_col;
   QUERY PLAN

 HashAggregate  (cost=7241.56..7518.87 rows=27731 width=4) (actual 
time=609.058..745.295 rows=208701 loops=1)
   ->  Seq Scan on mytable  (cost=0.00..6474.25 rows=306925 width=4) (actual 
time=0.063..280.000 rows=306925 loops=1)
 Total runtime: 840.321 ms
(3 rows)


d=# select version();
version

 PostgreSQL 8.2.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 
20061115 (prerelease) (Debian 4.1.1-21)
(1 row)


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Why are distinct and group by choosing different plans?

2007-08-02 Thread Gregory Stark
"Ron Mayer" <[EMAIL PROTECTED]> writes:

> I notice that I get different plans when I run the
> following two queries that I thought would be
> identical.
>
>   select distinct test_col from mytable;
>   select test_col from mytable group by test_col;
>
> Any reason why it favors one in one case but not the other?

I think "distinct" just doesn't know about hash aggregates yet. That's partly
an oversight and partly of a "feature" in that it gives a convenient way to
write a query which avoids them. I think it's also partly that "distinct" is
trickier to fix because it's the same codepath as "distinct on" which is
decidedly more complex than a simple "distinct".

> d=# explain analyze select distinct test_col from mytable;
>  
> QUERY PLAN
> 
>  Unique  (cost=0.00..14927.69 rows=27731 width=4) (actual time=0.144..915.214 
> rows=208701 loops=1)
>->  Index Scan using "mytable(test_col)" on mytable  (cost=0.00..14160.38 
> rows=306925 width=4) (actual time=0.140..575.580 rows=306925 loops=1)
>  Total runtime: 1013.657 ms
> (3 rows)

I assume you have random_page_cost dialled way down? The costs seem too low
for the default random_page_cost. This query would usually generate a sort
rather than an index scan.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Why are distinct and group by choosing different plans?

2007-08-02 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes:
> I think "distinct" just doesn't know about hash aggregates yet. That's partly
> an oversight and partly of a "feature" in that it gives a convenient way to
> write a query which avoids them. I think it's also partly that "distinct" is
> trickier to fix because it's the same codepath as "distinct on" which is
> decidedly more complex than a simple "distinct".

It's not an oversight :-(.  But the DISTINCT/DISTINCT ON code is old,
crufty, and tightly entwined with ORDER BY processing.  It'd be nice to
clean it all up someday, but the effort seems a bit out of proportion
to the reward...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


RES: RES: [PERFORM] Improving select peformance

2007-08-02 Thread Carlos H. Reimer
Hi,

In this case, I believe the best choice to improve the performance of this
particular SQL statement is adding the 'set join_collapse_limit = 1;' just
before the join statement, correct?

It there anything else we could do to, in this case, make the planner choose
better paths using the default join_collapse_limit?

Thank you in advance!

Reimer

> -Mensagem original-
> De: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] nome de Carlos H.
> Reimer
> Enviada em: quarta-feira, 1 de agosto de 2007 21:26
> Para: Alvaro Herrera
> Cc: Tom Lane; pgsql-performance@postgresql.org
> Assunto: RES: RES: [PERFORM] Improving select peformance
>
>
> Yes, but as the change did not alter the response time I used the original
> view.
>
> Anyway here are the response times using the changed view (without the
> concatenation conditions):
>
> with join_collapse_limit set to 8:
> --
> --
> --
> --
> ---
>  Nested Loop Left Join  (cost=963.68..76116.63 rows=1 width=194) (actual
> time=8219.028..1316669.201 rows=256 loops=1)
>->  Nested Loop  (cost=963.68..76116.23 rows=1 width=198) (actual
> time=8196.502..1316638.186 rows=256 loops=1)
>  Join Filter: ((gra.filmat = ive.filmat) AND (gra.codmat =
> ive.codmat) AND (gra.codcor = ive.codcor) AND (gra.codtam = ive.codtam))
>  ->  Nested Loop  (cost=1.11..3370.95 rows=1 width=150) (actual
> time=33.058..255.428 rows=414 loops=1)
>Join Filter: (div.coddiv = ddiv.codtab)
>->  Nested Loop  (cost=1.11..3369.89 rows=1 width=159)
> (actual time=33.043..249.609 rows=414 loops=1)
>  Join Filter: (sub.codsub = dsub.codtab)
>  ->  Nested Loop  (cost=1.11..3368.82 rows=1
> width=168)
> (actual time=33.026..243.603 rows=414 loops=1)
>Join Filter: ((gra.codcor)::text =
> (div.codite)::text)
>->  Hash Join  (cost=1.11..3356.11 rows=9
> width=145) (actual time=33.004..222.375 rows=414 loops=1)
>  Hash Cond: ((gra.codtam)::text =
> (sub.codite)::text)
>  ->  Nested Loop  (cost=0.00..3352.55
> rows=377 width=122) (actual time=32.810..219.046 rows=414 loops=1)
>->  Index Scan using
> i_fk_pro_ddep on
> tt_pro pro  (cost=0.00..123.83 rows=437 width=76) (actual
> time=25.199..118.851 rows=414 loops=1)
>  Index Cond: (1::numeric =
> depart)
>->  Index Scan using
> pk_gra on tt_gra
> gra  (cost=0.00..7.37 rows=1 width=46) (actual time=0.225..0.231 rows=1
> loops=414)
>  Index Cond: ((pro.filmat =
> gra.filmat) AND (pro.codmat = gra.codmat))
>  ->  Hash  (cost=1.05..1.05
> rows=5 width=32)
> (actual time=0.039..0.039 rows=5 loops=1)
>->  Seq Scan on tt_sub sub
> (cost=0.00..1.05 rows=5 width=32) (actual time=0.009..0.015
> rows=5 loops=1)
>->  Seq Scan on tt_div div  (cost=0.00..1.15
> rows=15 width=32) (actual time=0.003..0.015 rows=15 loops=414)
>  ->  Seq Scan on td_sub dsub  (cost=0.00..1.03 rows=3
> width=9) (actual time=0.002..0.005 rows=3 loops=414)
>->  Seq Scan on td_div ddiv  (cost=0.00..1.03
> rows=3 width=9)
> (actual time=0.002..0.005 rows=3 loops=414)
>  ->  Hash Join  (cost=962.57..72738.01 rows=363 width=114) (actual
> time=0.588..3178.606 rows=857 loops=414)
>Hash Cond: (ive.sequen = ven.sequen)
>->  Nested Loop  (cost=0.00..69305.21 rows=657761 width=85)
> (actual time=0.041..2623.627 rows=656152 loops=414)
>  ->  Seq Scan on td_nat nat  (cost=0.00..1.24 rows=1
> width=9) (actual time=0.004..0.012 rows=1 loops=414)
>Filter: (-3::numeric = codtab)
>  ->  Seq Scan on tt_ive ive  (cost=0.00..62726.36
> rows=657761 width=76) (actual time=0.034..1685.506 rows=656152 loops=414)
>Filter: ((sitmov <> 'C'::bpchar) AND
> ('001'::bpchar = codfil))
>->  Hash  (cost=960.39..960.39 rows=174 width=89) (actual
> time=41.542..41.542 rows=394 loops=1)
>  ->  Hash Left Join  (cost=3.48..960.39 rows=174
> width=89) (actual time=16.936..40.693 rows=394 loops=1)
>Hash Cond: ((ven.filcli = cfg.vc_filcli) AND
> (ven.codcli = cfg.vc_codcli))
>->  Hash Join  (cost=2.45..958.05 rows=174
> width=106) (actual time=16.895..39.747 rows=394 loops=1)
>  Hash Cond: ((ven.filpgt = pla.filpgt) AND
> (ven.codpgt = pla.codpgt))
>  ->  Index Scan using i_l

Re: RES: RES: [PERFORM] Improving select peformance

2007-08-02 Thread Tom Lane
"Carlos H. Reimer" <[EMAIL PROTECTED]> writes:
> In this case, I believe the best choice to improve the performance of this
> particular SQL statement is adding the 'set join_collapse_limit = 1;' just
> before the join statement, correct?

That's a mighty blunt instrument.  The real problem with your query is
the misestimation of the join sizes --- are you sure the table
statistics are up to date?  Maybe you'd get better estimates with more
statistics (ie, increase the stats target for these tables).

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend