Re: [PERFORM] Similar querys, better execution time on worst execution plan

2003-06-26 Thread SZUCS Gábor
Fernando,

1. Try EXPLAIN ANALYZE. Cost alone isn't an absolute measure. I think it's
only to see which parts of the query are expected to be slowest. However,
EXP ANA will give you exact times in msec (which effectively means it
executes the query).

2. I think calling upper() for each row costs more than direct comparison,
but not sure

3. Notice that there are seq scans with filter conditions like
  "id_instalacion = 2::numeric"
  Do you have indices on id_instalacion, which seems to be a numeric field?
if so, try casting the constant expressions in the query to numeric so that
postgresql may find the index. If you don't have such indices, it may be
worth to create them. (I guess you only have it on the table aliased with c,
since it does an index scan there.

4. another guess may be indices on (id_instalacion, activo), or, if activo
has few possible values (for example, it may be only one of three letters,
say, 'S', 'A' or 'K'), partial indices like:

CREATE INDEX cont_sbc_id_ins_S ON cont_sbc (id_instalacion)
WHERE activo in ('S', 's');
CREATE INDEX cont_sbc_id_ins_A ON cont_sbc (id_instalacion)
WHERE activo in ('A', 'a');
CREATE INDEX cont_sbc_id_ins_K ON cont_sbc (id_instalacion)
WHERE activo in ('K', 'k');

G.
--- cut here ---
 WHERE c.id_instalacion = 2
AND s.id_instalacion = 2
AND p.id_instalacion = 2
...

 ->  Seq Scan on cont_sbc s  (cost=0.00..4.44 rows=1 width=35)
 Filter: ((id_instalacion = 2::numeric)
  AND (upper((activo)::text) = 'S'::text))
 ->  Index Scan using cont_cont_cont_sbc_fk_i on cont_contenido c
 (cost=0.00..15.56 rows=1 width=43)
 Index Cond: ((c.id_instalacion = 2::numeric)
  AND (c.id_sbc = "outer".id_sbc))
 Filter: (upper((activo)::text) = 'S'::text)
 ->  Seq Scan on cont_publicacion p  (cost=0.00..98.54 rows=442 width=55)
 Filter: (id_instalacion = 2::numeric)



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance advice

2003-06-26 Thread Manfred Koizar
On Wed, 25 Jun 2003 11:47:48 +0200, "Michael Mattox"
<[EMAIL PROTECTED]> wrote:
>> |INFO:  --Relation public.jdo_sequencex--
>> |INFO:  Pages 28: Changed 1, Empty 0; Tup 1: Vac 5124, Keep 0, UnUsed 0.
>>   ^  
>> This table could stand more frequent VACUUMs, every 15 minutes or so.
>
>Can you explain what the "Vac" is

That's a long story, where shall I start?  Search for MVCC in the docs
and in the list archives.  So you know that every DELETE and every
UPDATE leaves behind old versions of tuples.  The space occupied by
these cannot be used immediately.  VACUUM is responsible for finding
dead tuples, which are so old that there is no active transaction that
could be interested in their contents, and reclaiming the space.  The
number of such tuples is reported as "Vac".

> and how you knew that it should be vacuumed more often?

jdo_sequencex stores (5000 old versions and 1 active version of) a
single row in 28 pages.  Depending on when you did ANALYSE it and
depending on the SQL statement, the planner might think that a
sequential scan is the most efficient way to access this single row.
A seq scan has to read 28 pages instead of a single page.  Well,
probably all 28 pages are in the OS cache or even in PG's shared
buffers, but 27 pages are just wasted and push out pages you could
make better use of.  And processing those 28 pages does not come at no
CPU cost.  If you VACUUM frequently enough, this relation never grows
beyond one page.

>I'm using Java Data Objects (JDO) which is an O/R mapper.  It generated the
>schema from my object model by default it used a table for a sequence.  I
>just got finished configuring it to use a real postgres sequence.  With the
>way they have it designed, it opens and closes a connection each time it
>retrieves a sequence.  Would I get a performance increase if I modify their
>code to retrieve multiple sequence numbers in one connection?  For example I
>could have it grab 50 at a time, which would replace 50 connections with 1.

Better yet you modify the code to use the normal access functions for
sequences.

Servus
 Manfred

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] problem with pg_statistics

2003-06-26 Thread Andre Schubert
Hi,

i think i need a little help with a problem with pg_statistic.
Lets say i have a table to collect traffic-data.
The table has a column time_stamp of type timesamptz.
The table has a single-column index on time_stamp.
The table has around 5 million records.

If i delete all statistical data from pg_statistic and do a
explain analyze i got this result.

-
explain analyze select * from tbl_traffic where tbl_traffic.time_stamp >= '2003-05-01' 
and tbl_traffic.time_stamp < '2003-06-01';
NOTICE:  QUERY PLAN:

Index Scan using idx_ts on tbl_traffic  (cost=0.00..97005.57 rows=24586 width=72) 
(actual time=0.19..7532.63 rows=1231474 loops=1)
Total runtime: 8179.08 msec

EXPLAIN
-

after i do a vacuum full verbose analyze i got the following result.

-
explain analyze select * from tbl_traffic where tbl_traffic.time_stamp >= '2003-05-01' 
and tbl_traffic.time_stamp < '2003-06-01';
NOTICE:  QUERY PLAN:

Seq Scan on tbl_traffic  (cost=0.00..127224.24 rows=1197331 width=52) (actual 
time=0.03..14934.70 rows=1231474 loops=1)
Total runtime: 15548.35 msec

EXPLAIN
-

now i disable seqscans with set enable_seqscan to off
and i got the following.

-
explain analyze select * from tbl_traffic where tbl_traffic.time_stamp >= '2003-05-01' 
and tbl_traffic.time_stamp < '2003-06-01';
NOTICE:  QUERY PLAN:

Index Scan using idx_ts on tbl_traffic  (cost=0.00..3340294.11 rows=1197331 width=52) 
(actual time=0.21..7646.29 rows=1231474 loops=1)
Total runtime: 8285.92 msec

EXPLAIN
-

Could anybody explain or give some hint why the index is not used
although it is faster than a sequence-scan ?
BTW:
 version  
---
 PostgreSQL 7.2 on i686-pc-linux-gnu, compiled by GCC 2.96

Thanks in advance, as

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Similar querys, better execution time on worst execution plan

2003-06-26 Thread Fernando Papa

> -Mensaje original-
> De: SZUCS Gábor [mailto:[EMAIL PROTECTED] 
> Enviado el: jueves, 26 de junio de 2003 7:31
> Para: [EMAIL PROTECTED]
> Asunto: Re: [PERFORM] Similar querys, better execution time 
> on worst execution plan
> 
> 
> Fernando,
> 
> 1. Try EXPLAIN ANALYZE. Cost alone isn't an absolute measure. 
> I think it's only to see which parts of the query are 
> expected to be slowest. However, EXP ANA will give you exact 
> times in msec (which effectively means it executes the query).

Ok, yes, I did only explay because I run several times the query and get avg. run 
time. but it's true, it's better to do EXP ANA.
 
> 2. I think calling upper() for each row costs more than 
> direct comparison, but not sure

It's the only answer than I can found... maybe do a lot of uppers and then compare 
will be too much than compare with 2 conditions...
 
> 3. Notice that there are seq scans with filter conditions like
>   "id_instalacion = 2::numeric"
>   Do you have indices on id_instalacion, which seems to be a 
> numeric field? if so, try casting the constant expressions in 
> the query to numeric so that postgresql may find the index. 
> If you don't have such indices, it may be worth to create 
> them. (I guess you only have it on the table aliased with c, 
> since it does an index scan there.

Yes, we have index on id_instalacion, but now we have only one instalation, so the 
content of these field, in the 99% of the rows, it's 2. I think in this case it's ok 
to choose seq scan.
 
> 4. another guess may be indices on (id_instalacion, activo), 
> or, if activo has few possible values (for example, it may be 
> only one of three letters, say, 'S', 'A' or 'K'), partial 
> indices like:
> 
> CREATE INDEX cont_sbc_id_ins_S ON cont_sbc (id_instalacion)
> WHERE activo in ('S', 's');
> CREATE INDEX cont_sbc_id_ins_A ON cont_sbc (id_instalacion)
> WHERE activo in ('A', 'a');
> CREATE INDEX cont_sbc_id_ins_K ON cont_sbc (id_instalacion)
> WHERE activo in ('K', 'k');
> 

I need to recheck about the "quality" of "active" field. Really I don't know if I 
found a lot of 'S', a lot of 'N', maybe we will have 50%/50% of 'S' or 'N'. This will 
be important to define index.

Thanks for your answer.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] problem with pg_statistics

2003-06-26 Thread Tom Lane
Andre Schubert <[EMAIL PROTECTED]> writes:
> i think i need a little help with a problem with pg_statistic.

Try reducing random_page_cost --- although you'd be foolish to set it on
the basis of just a single test query.  Experiment with a few different
tables, and keep in mind that repeated tests will be affected by caching.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Similar querys, better execution time on worst execution plan

2003-06-26 Thread SZUCS Gábor
*happy* :)))

G.
--- cut here ---
- Original Message - 
From: "Fernando Papa" <[EMAIL PROTECTED]>
Sent: Thursday, June 26, 2003 3:33 PM


I need to recheck about the "quality" of "active" field. Really I don't know
if I found a lot of 'S', a lot of 'N', maybe we will have 50%/50% of 'S' or
'N'. This will be important to define index.

Thanks for your answer.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] problem with pg_statistics

2003-06-26 Thread Manfred Koizar
On Thu, 26 Jun 2003 10:08:05 -0400, Tom Lane <[EMAIL PROTECTED]>
wrote:
>Andre Schubert <[EMAIL PROTECTED]> writes:
>> i think i need a little help with a problem with pg_statistic.
>
>Try reducing random_page_cost

With index scan cost being more than 25 * seq scan cost, I guess that
- all other things held equal - even random_page_cost = 1 wouldn't
help.

Andre might also want to experiment with effective_cache_size and with
ALTER TABLE ... SET STATISTICS.

Or there's something wrong with correlation?

Andre, what hardware is this running on?  What are the values of
shared_buffers, random_page_cost, effective_cache_size, ... ?  Could
you show us the result of

SELECT * FROM pg_stats
 WHERE tablename = "tbl_traffic" AND attname = "time_stamp";

Servus
 Manfred

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] problem with pg_statistics

2003-06-26 Thread Tom Lane
Manfred Koizar <[EMAIL PROTECTED]> writes:
> On Thu, 26 Jun 2003 10:08:05 -0400, Tom Lane <[EMAIL PROTECTED]>
> wrote:
>> Try reducing random_page_cost

> With index scan cost being more than 25 * seq scan cost, I guess that
> - all other things held equal - even random_page_cost = 1 wouldn't
> help.

Oh, you're right, I was comparing the wrong estimated costs.  Yeah,
changing random_page_cost won't fix it.

> Or there's something wrong with correlation?

That seems like a good bet.  Andre, is this table likely to be
physically ordered by time_stamp, or nearly so?  If so, do you
expect that condition to persist, or is it just an artifact of
a test setup?

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] problem with pg_statistics

2003-06-26 Thread Andre Schubert
On Thu, 26 Jun 2003 12:03:52 -0400
Tom Lane <[EMAIL PROTECTED]> wrote:

> Manfred Koizar <[EMAIL PROTECTED]> writes:
> > On Thu, 26 Jun 2003 10:08:05 -0400, Tom Lane <[EMAIL PROTECTED]>
> > wrote:
> >> Try reducing random_page_cost
> 
> > With index scan cost being more than 25 * seq scan cost, I guess that
> > - all other things held equal - even random_page_cost = 1 wouldn't
> > help.
> 
> Oh, you're right, I was comparing the wrong estimated costs.  Yeah,
> changing random_page_cost won't fix it.
> 
> > Or there's something wrong with correlation?
> 
> That seems like a good bet.  Andre, is this table likely to be
> physically ordered by time_stamp, or nearly so?  If so, do you
> expect that condition to persist, or is it just an artifact of
> a test setup?
> 

First of all thanks for the quick response.

We have three servers at different places, all servers are running
with athlon processors and have ram between 512M up to 1024M,
and a frequency between 700 and 1400Mhz.
All servers running under Linux 7.2 Kernel 2.4.20.
We use this table to collect traffic of our clients.
Traffic data are inserted every 5 minutes with the actual datetime
of the transaction, thatswhy the table should be physically order by time_stamp.
All servers are running in production and i could reproduce the problem on
all three servers.

To answer Manfreds questions:
> Andre, what hardware is this running on?  What are the values of
> shared_buffers, random_page_cost, effective_cache_size, ... ?  Could
> you show us the result of
> 
>   SELECT * FROM pg_stats
>   WHERE tablename = "tbl_traffic" AND attname = "time_stamp";

The only changes we have made are

sort_mem = 32000
shared_buffers = 13000

All other values are commented out and should be set to default
by postgres itself.

#max_fsm_relations = 100# min 10, fsm is free space map
#max_fsm_pages = 1  # min 1000, fsm is free space map


#effective_cache_size = 1000  # default in 8k pages
#random_page_cost = 4
#cpu_tuple_cost = 0.01
#cpu_index_tuple_cost = 0.001
#cpu_operator_cost = 0.0025

Hope this help ...

Thanks, as

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] problem with pg_statistics

2003-06-26 Thread Andre Schubert
On Thu, 26 Jun 2003 12:03:52 -0400
Tom Lane <[EMAIL PROTECTED]> wrote:

> Manfred Koizar <[EMAIL PROTECTED]> writes:
> > On Thu, 26 Jun 2003 10:08:05 -0400, Tom Lane <[EMAIL PROTECTED]>
> > wrote:
> >> Try reducing random_page_cost
> 
> > With index scan cost being more than 25 * seq scan cost, I guess that
> > - all other things held equal - even random_page_cost = 1 wouldn't
> > help.
> 
> Oh, you're right, I was comparing the wrong estimated costs.  Yeah,
> changing random_page_cost won't fix it.
> 
> > Or there's something wrong with correlation?
> 
> That seems like a good bet.  Andre, is this table likely to be
> physically ordered by time_stamp, or nearly so?  If so, do you
> expect that condition to persist, or is it just an artifact of
> a test setup?
> 

Sorry forgot the pg_stat query...


SELECT * FROM pg_stats where tablename = 'tbl_traffic' and attname = 'time_stamp';
  tablename  |  attname   | null_frac | avg_width | n_distinct |   

  most_common_vals 

| 
most_common_freqs  |   
  
   
 histogram_bounds   
   
| correlation 
-++---+---++---
---
++-
---
---+-
 tbl_traffic | time_stamp | 0 | 8 | 104009 | {"2003-06-03 
19:12:01.059625+02","2003-02-03 19:52:06.666296+01","2003-02-13 
09:59:45.415763+01","2003
-02-28 18:10:28.536399+01","2003-04-11 18:09:42.30363+02","2003-04-26 
20:35:50.110235+02","2003-05-03 11:09:32.991507+02","2003-05-20 
09:53:51.271853+02","2003-05-21 2
0:55:59.155387+02","2003-06-02 02:38:28.823182+02"} | 
{0.0013,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001} | {"2002-07-01 
00:00:00+02","2003-02-21 01:59:
46.107696+01","2003-03-11 15:00:37.418521+01","2003-03-26 
18:14:50.028972+01","2003-04-10 13:43:20.75909+02","2003-04-27 
09:03:19.592213+02","2003-05-08 22:35:41.99761
6+02","2003-05-22 15:34:42.932958+02","2003-06-03 00:53:05.870782+02","2003-06-15 
08:45:41.154875+02","2003-06-27 07:18:30.265868+02"} |   -0.479749
(1 row)

Thanks, as

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]