Re: [PERFORM] Force another plan.

2006-02-19 Thread Fredrik Olsson

Tom Lane skrev:

Fredrik Olsson <[EMAIL PROTECTED]> writes:
  
I have some quite huge queries, inside functions, so debugging is kind 
of hard. But I have located the query that for some reason gets 4 times 
as slow after an analyze.



Could we see EXPLAIN ANALYZE output for these cases, not just EXPLAIN?
It seems a bit premature to be discussing ways to "force" a plan choice
when you don't even have a clear idea what's going wrong.
  

Sorry about that, my fault. Here comes EXPLAIN ANALYZE:

Before VACUUM ANALYZE:
===
Nested Loop  (cost=16.80..34.33 rows=1 width=28) (actual
time=54.197..98.598 rows=1 loops=1)
 Join Filter: (ischildof(2, "outer".calendar) OR (hashed subplan))
 ->  Nested Loop  (cost=0.00..11.66 rows=1 width=32) (actual
time=0.307..0.458 rows=3 loops=1)
   ->  Index Scan using t_events_eventype on t_events e
(cost=0.00..5.82 rows=1 width=28) (actual time=0.241..0.307 rows=3 loops=1)
 Index Cond: (eventtype = 1)
 Filter: (rrfreq IS NOT NULL)
   ->  Index Scan using t_entities_pkey on t_entities te
(cost=0.00..5.83 rows=1 width=4) (actual time=0.035..0.039 rows=1 loops=3)
 Index Cond: (te."ID" = "outer".entity)
 Filter: (partof = 'events'::name)
 ->  Index Scan using t_entities_pkey on t_entities  (cost=0.00..5.85
rows=1 width=4) (actual time=28.445..28.447 rows=0 loops=3)
   Index Cond: (t_entities."ID" = "outer".entity)
   Filter: ((haveaccess(createdby, responsible, "class", false) OR
CASE WHEN (partof = 'contacts'::name) THEN ischildof(ancestorof(me()),
"ID") ELSE false END) AND (subplan))
   SubPlan
 ->  Function Scan on alleventoccurances  (cost=0.00..12.50
rows=1000 width=8) (actual time=19.745..19.745 rows=0 loops=3)
 SubPlan
   ->  Seq Scan on t_attendees  (cost=0.00..16.38 rows=170 width=4)
(actual time=0.422..0.447 rows=2 loops=1)
 Filter: ischildof(2, contact)
Total runtime: 99.814 ms

After VACUUM ANALYZE:
===
Nested Loop  (cost=2.11..4.92 rows=1 width=28) (actual
time=434.321..439.102 rows=1 loops=1)
 Join Filter: ("inner"."ID" = "outer"."ID")
 ->  Hash Join  (cost=2.11..3.67 rows=1 width=32) (actual
time=434.001..438.775 rows=1 loops=1)
   Hash Cond: ("outer"."ID" = "inner".entity)
   Join Filter: (ischildof(2, "inner".calendar) OR (hashed subplan))
   ->  Seq Scan on t_entities  (cost=0.00..1.49 rows=7 width=4)
(actual time=404.539..409.302 rows=2 loops=1)
 Filter: ((haveaccess(createdby, responsible, "class",
false) OR CASE WHEN (partof = 'contacts'::name) THEN
ischildof(ancestorof(me()), "ID") ELSE false END) AND (subplan))
 SubPlan
   ->  Function Scan on alleventoccurances
(cost=0.00..12.50 rows=1000 width=8) (actual time=27.871..27.871 rows=0
loops=14)
   ->  Hash  (cost=1.07..1.07 rows=3 width=28) (actual
time=0.063..0.063 rows=3 loops=1)
 ->  Seq Scan on t_events e  (cost=0.00..1.07 rows=3
width=28) (actual time=0.023..0.034 rows=3 loops=1)
   Filter: ((rrfreq IS NOT NULL) AND (eventtype = 1))
   SubPlan
 ->  Seq Scan on t_attendees  (cost=0.00..1.02 rows=1 width=4)
(actual time=0.205..0.228 rows=2 loops=1)
   Filter: ischildof(2, contact)
 ->  Seq Scan on t_entities te  (cost=0.00..1.18 rows=6 width=4)
(actual time=0.029..0.045 rows=6 loops=1)
   Filter: (partof = 'events'::name)
Total runtime: 440.385 ms

As I read it, the villain is the sequential sqan on t_entities with the
huge filter; haveacces() ...
And that is understandable, doing haveaccess() on all rows is not good.
A  much better solution in this case would be to first get the set that
conforms to (partof = 'events'::name), that would reduce the set to a
third. Secondly applying (eventtype=1) would reduce that to half.  Then
it is time to do the  (haveaccess() ...).

Perhaps a small explanation of the tables, and their intent is in order.

What I have is one "master table" with entities (t_entitites), and two
child tables t_contacts and t_events. In a perfect world rt_contacts and
t_events would have inherited from t_entities as they share muuch data,
but then I would not be able to have foreign key referencing just
events, or just contacts. So instead every row in events and contacts
have a corresponding one to one row in entities. The fourth table used
in this query is t_attendees, that links well attendees for an event to
contacts. Here goes a simplified example (SQL says more then 1000 words):

CREATE TABLE t_entities (
 "ID" integer PRIMARY KEY,
 "createdby" integer NOT NULL,
 "class" integer NOT NULL,  -- Defines visibility for entity and is
used by haveaccess()
 "partof" name NOT NULL
);

CREATE TABLE t_contacts (
 "entity" integer PRIMARY KEY REFERENCES t_entities ("ID"),
 "undercontact" integer REFERENCES t_contacts ("entity"), -- Tree
structure, used by haveaccess()
 "name" varchar(48)
);

ALTER TABLE t_entities ADD FOREIGN KEY ("createdby")
 REFERENCES t_contacts ("entity");

CREATE TABLE t_events (
 "e

[PERFORM] Question about query planner

2006-02-19 Thread Emil Briggs


The following query runs much slower than I would have expected. I ran it 
through EXPLAIN ANALYZE (results included after) and I don't understand why 
the planner is doing what it is. All of the columns from the WHERE part of 
the query are indexed and the indexes are being used. The number of rows 
being reported is equal to the size of the table though so it's really no 
better than just doing a sequential scan. This is running on Postgres 8.0.7 
and the system has been freshly vaccumed with the statistics target set to 
800. Does any know why the query behaves like this? Does it have anything to 
do with the OR statements in the where clause spanning two different tables? 
I tried an experiment where I split this into queries two queries using UNION 
and it ran in less than 1 ms. Which is a solution but I'm still curious why 
the original was so slow.


SELECT DISTINCT a.account_id, l.username, a.status, a.company, a.fax_num,
a.primary_phone, a.responsible_first, a.responsible_last FROM
 accounts a, logins l, supplemental_info i
 WHERE l.account_id=a.account_id and
  i.account_id=a.account_id and
 ((a.primary_phone = 'xxx-xxx-') OR (a.alternate_phone = 'xxx-xxx-') 
OR (i.contact_num = 'xxx-xxx-'))
 ORDER BY a.status, a.primary_phone, a.account_id;


EXPLAIN ANALYZE results

 Unique  (cost=47837.93..47838.02 rows=4 width=92) (actual 
time=850.250..850.252 rows=1 loops=1)
   ->  Sort  (cost=47837.93..47837.94 rows=4 width=92) (actual 
time=850.248..850.248 rows=1 loops=1)
 Sort Key: a.status, a.primary_phone, a.account_id, l.username, 
a.company, a.fax_num, a.responsible_first, a.responsible_last
 ->  Nested Loop  (cost=0.00..47837.89 rows=4 width=92) (actual 
time=610.641..850.222 rows=1 loops=1)
   ->  Merge Join  (cost=0.00..47818.70 rows=4 width=88) (actual 
time=610.602..850.179 rows=1 loops=1)
 Merge Cond: ("outer".account_id = "inner".account_id)
 Join Filter: ((("outer".primary_phone)::text = 
'xxx-xxx-'::text) OR (("outer".alternate_phone)::text = 
'xxx-xxx-'::text) OR (("inner".contact_num)::text = 
'xxx-xxx-'::text))
 ->  Index Scan using accounts_pkey on accounts a  
(cost=0.00..18423.73 rows=124781 width=95) (actual time=0.019..173.523 
rows=124783 loops=1)
 ->  Index Scan using supplemental_info_account_id_idx on 
supplemental_info i  (cost=0.00..15393.35 rows=124562 width=24) (actual 
time=0.014..145.757 rows=124643 loops=1)
   ->  Index Scan using logins_account_id_idx on logins l  
(cost=0.00..4.59 rows=2 width=20) (actual time=0.022..0.023rows=1 loops=1)
 Index Cond: ("outer".account_id = l.account_id)
 Total runtime: 850.429 ms


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Force another plan.

2006-02-19 Thread Tom Lane
Fredrik Olsson <[EMAIL PROTECTED]> writes:
> ->  Seq Scan on t_entities  (cost=0.00..1.49 rows=7 width=4)
> (actual time=404.539..409.302 rows=2 loops=1)
>   Filter: ((haveaccess(createdby, responsible, "class",
> false) OR CASE WHEN (partof = 'contacts'::name) THEN
> ischildof(ancestorof(me()), "ID") ELSE false END) AND (subplan))
>   SubPlan
> ->  Function Scan on alleventoccurances
> (cost=0.00..12.50 rows=1000 width=8) (actual time=27.871..27.871 rows=0
> loops=14)

This seems to be your problem right here: evaluating that subplan for
each row of t_entities is pretty expensive, and yet the planner's
estimating a total cost of only 1.49 to run the scan.  What PG version
is this?  AFAICT we've accounted for subplan costs in scan quals for
a long time, certainly since 7.4.  Can you put together a self-contained
test case for this?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Question about query planner

2006-02-19 Thread Tom Lane
Emil Briggs <[EMAIL PROTECTED]> writes:
> Does any know why the query behaves like this? Does it have anything to 
> do with the OR statements in the where clause spanning two different tables? 

Exactly.

> SELECT DISTINCT a.account_id, l.username, a.status, a.company, a.fax_num,
> a.primary_phone, a.responsible_first, a.responsible_last FROM
>  accounts a, logins l, supplemental_info i
>  WHERE l.account_id=a.account_id and
>   i.account_id=a.account_id and
>  ((a.primary_phone = 'xxx-xxx-') OR (a.alternate_phone = 'xxx-xxx-') 
> OR (i.contact_num = 'xxx-xxx-'))
>  ORDER BY a.status, a.primary_phone, a.account_id;

The system has to fetch all the rows of a, because any of them might
join to a row of i matching the i.contact_num condition, and conversely
it has to fetch every row of i because any of them might join to a row
of a matching one of the phone conditions.  It is therefore necessary
to effectively form the entire join of a and i; until you've done that
there is no way to eliminate any rows.

I'm a bit surprised that it's using the indexes at all --- a hash join
with seqscan inputs would probably run faster.  Try increasing work_mem
a bit.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Need pointers to "standard" pg database(s) for testing

2006-02-19 Thread Christopher Kings-Lynne

Not really, but you can check out the sample databases project:

http://pgfoundry.org/projects/dbsamples/

Chris

Ron wrote:

I assume we have such?

Ron



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] How to optimize a JOIN with BETWEEN?

2006-02-19 Thread andrew
Here's a simplified version of the schema:

Table A has an ID field, an observation date, and other stuff. There are about 
20K IDs and 3K observations per ID. Table B has a matching ID field, minimum 
and maximum dates, a code, and other stuff, about 0-50 records per ID. For a 
given ID, the dates in B never overlap. On A, the PK is (id, obsdate). On B, 
the PK is (id, mindate). I want

SELECT a.id, b.code, AVG(other stuff) FROM A LEFT JOIN B ON a.id=b.id AND 
a.obsdate BETWEEN b.mindate AND b.maxdate GROUP BY 1,2;

Is there a way to smarten the query to take advantage of the fact at most one 
record of B matches A? Also, I have a choice between using a LEFT JOIN or 
inserting dummy records into B to fill in the gaps in the covered dates, which 
would make exactly one matching record. Would this make a difference?

Thanks.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Need pointers to "standard" pg database(s) for

2006-02-19 Thread Christopher Kings-Lynne
Relating to this.  If anyone can find govt or other free db's and 
convert them into pgsql format, I will host them on the dbsamples page. 
 The dbsamples are _really_ popular!


Chris

Scott Marlowe wrote:

On Fri, 2006-02-17 at 10:51, Ron wrote:

I assume we have such?


Depends on what you wanna do.
For transactional systems, look at some of the stuff OSDL has done.

For large geospatial type stuff, the government is a good source, like
www.usgs.gov or the fcc transmitter database.

There are other ones out there.  Really depends on what you wanna test.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Avoiding cartesian product

2006-02-19 Thread Virag Saksena



Szűcs,
    Thanks for your suggestion, I 
guess there is more than one way to attack the problem.
 
I ended up using a trick with limit to get the next 
row ...
 
select (b.gc_minor- a.gc_minor), (b.gc_major- 
a.gc_major)from jam_trace_sys a join jam_trace_sys b on(b.seq_no = 
(select c.seq_no from jam_trace_sys c where c.trace_id = a.trace_id and 
c.seq_no > a.seq_no order by c.trace_id, c.seq_no limit 
1) and b.trace_id = a.trace_id),jam_tracesnap s1, jam_tracesnap 
s2where s1.trace_id = a.trace_idand s1.seq_no = a.seq_noand 
s2.trace_id = b.trace_idand s2.seq_no = b.seq_noand a.trace_id = 
22order by a.seq_no;
 
This gave me a nice clean execution plan (there are 
some extra sources listed, but that is a bug in postgresql) ...
 
QUERY 
PLAN---Sort  
(cost=11.24..11.25 rows=1 width=20) (actual time=0.040..0.040 rows=0 
loops=1) Sort Key: a.seq_no ->Nested Loop  
(cost=0.00..11.23 rows=1 width=20) (actual time=0.028..0.028 rows=0 
loops=1)    Join Filter: ("inner".seq_no = 
"outer".seq_no)    ->Nested Loop  (cost=0.00..9.20 
rows=1 width=32) (actual time=0.024..0.024 rows=0 
loops=1)   Join Filter: ("inner".seq_no = 
"outer".seq_no)   ->Nested Loop  
(cost=0.00..7.17 rows=1 width=32) (actual time=0.020..0.020 rows=0 
loops=1)  Join Filter: 
("inner".seq_no = 
(subplan))  ->Index 
Scan using jam_trace_sys_n1 on jam_trace_sys a  (cost=0.00..3.41 rows=1 
width=16) (actual time=0.016..0.016 rows=0 
loops=1) 
Index Cond: (trace_id = 
22)  ->Index Scan 
using jam_trace_sys_n1 on jam_trace_sys b  (cost=0.00..3.41 rows=1 
width=16) (never 
executed) 
Index Cond: (22 = 
trace_id)  
SubPlan  
->Limit  (cost=0.00..0.33 rows=1 width=8) (never 
executed)    
->Index Scan using jam_trace_sys_n1 on jam_trace_sys c  (cost=0.00..6.36 
rows=19 width=8) (never 
executed)   
Index Cond: ((trace_id = $0) AND (seq_no > 
$1))   ->Index Scan using 
jam_tracesnap_n1 on jam_tracesnap s1  (cost=0.00..2.01 rows=1 width=8) 
(never executed)  Index 
Cond: (22 = trace_id)    ->Index Scan using 
jam_tracesnap_n1 on jam_tracesnap s2  (cost=0.00..2.01 rows=1 width=8) 
(never executed)   Index Cond: (22 = 
trace_id)
Regards,
 
Virag

- Original Message - 
From: "Szűcs Gábor" <[EMAIL PROTECTED]>
To: 
Sent: Monday, January 09, 2006 7:59 AM
Subject: Re: Avoiding cartesian 
product
> Dear Virag,> > AFAIK aggregates aren't indexed in 
postgres (at least not before 8.1, which > indexes min and max, 
iirc).> > Also, I don't think you need to exactly determine the 
trace_id. Try this one > (OTOH; might be wrong):> > select 
DISTINCT ON (a.trace_id, a.seq_no) -- See below>    b.gc_minor 
- a.gc_minor, b.gc_major - a.gc_major> from jam_trace_sys a, 
jam_trace_sys b> where a.trace_id = 22>    and 
b.trace_id = a.trace_id>    and b.seq_no > a.seq_no -- 
Simply ">" is enough> order by a.trace_id, a.seq_no, b.seq_no; -- 
DISTINCT, see below> > The trick is that DISTINCT takes the first 
one in each group (IIRC it is > granted, at least someone told me on one 
of these lists :) ) so if you order > by the DISTINCT attributes and then 
by b.seq_no, you'll get the smallest of > appropriate b.seq_no values for 
each DISTINCT values.> > The idea of DISTINCTing by both columns 
is to make sure the planner finds > the index. (lately I had a similar 
problem: WHERE a=1 ORDER BY b LIMIT 1 > used an index on b, instead of an 
(a,b) index. Using ORDER BY a,b solved it)> > HTH,> 
> --> G.> > > On 2006.01.04. 5:12, Virag 
Saksena wrote:> > > > I have a table which stores cumulative 
values> > I would like to display/chart the deltas between successive 
data > > collections> >  > > If my primary 
key only increments by 1, I could write a simple query> >  
> > select b.gc_minor - a.gc_minor, b.gc_major - a.gc_major> 
>   from jam_trace_sys a, jam_trace_sys b> >  where 
a.trace_id = 22> >    and b.trace_id = 
a.trace_id> >    and b.seq_no = a.seq_no + 1> 
>  order by a.seq_no;> >  > > However the 
difference in sequence number is variable.> > So (in Oracle) I used to 
extract the next seq_no using a correlated > > sub-query> 
>  > > select b.gc_minor - a.gc_minor, b.gc_major - 
a.gc_major> > from jam_trace_sys a, jam_trace_sys b> > where 
a.trace_id = 22> > and (b.trace_id, b.seq_no) => > (select 
a.trace_id, min(c.seq_no) from jam_trace_sys c> > where c.trace_id = 
a.trace_id and c.seq_no > a.seq_no)> >  order by 
a.seq_no;> >  > > For every row in A, The correlated 
sub-query from C will execute> > With an appropriate index, it will 
just descend the index Btree> > go one row to the right and return 
that row (min > :value)> > and join to table B> >  
> > SELECT STATEMENT> >   SORT ORDER BY> 
>    TABLE ACCESS BY INDEX ROWID JAM_TRACE_SYS B> 
>  NESTED LOOPS> 
>