[SQL] Index lookup on > and < criteria

2005-11-01 Thread David Durham
Apologies if this questions is asked often.  I'm doing some select 
statements based on a timestamp field.  I have an index on the field, 
and when I use the '=' operator the index is used.  However, if I use 
the '>' or '<' operators, then it does a full table scan.  I've got 
around 6 million rows, so I would think that an index scan would be more 
appropriate.



Here are the statements I'm looking at:

   select * from myTable where myTimeStamp = '10/1/2005';

uses an index.

   select max(myTimeStamp) from myTable;
   select * from myTable where myTimeStamp < '10/2/2005';
   select * from myTable where myTimeStamp < '10/2/2005' and myTimeStamp
   >= '10/1/2005';

do not use indexes.  Can anyone point me to some info about what's going 
on?  I've started reading through the manual (chapter 13) which I think 
explains query optimizing, index usage etc.  It seems like this would be 
a common enough problem that it would have a relatively simple solution. 
 Thanks.



-Dave





---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [SQL] Index lookup on > and < criteria

2005-11-01 Thread David Durham

Michael Fuhr wrote:

No need to guess: run the queries with enable_seqscan disabled and
see if an index scan is indeed faster.



  select max(myTimeStamp) from myTable;



In current releases min() and max() can't use indexes; search the
archives for numerous discussions of the reasons.  The workarounds
are, respectively:


Good to know.  Thanks all.


How many rows do these queries return?  If they return a significant
portion of the table then the planner might think that a sequential
scan would be faster than an index scan.  It would be useful to see
the EXPLAIN ANALYZE output of these queries so we can see how
accurate the planner's row count estimates are.


Ok.  Looks like you guys caught me in a(n unintentional) lie.  Here goes:

sipcdr=# explain analyze select * from october_cdr_call where begin_time 
>= '10/1/2005' and begin_time < '10/4/2005';


 QUERY PLAN 



--
 Index Scan using october_begin_time on october_cdr_call 
(cost=0.00..98383.82 r

ows=24594 width=568) (actual time=0.280..79274.579 rows=538592 loops=1)
   Index Cond: ((begin_time >= '2005-10-01 00:00:00'::timestamp without 
time zon

e) AND (begin_time < '2005-10-04 00:00:00'::timestamp without time zone))
 Total runtime: 81457.938 ms
(3 rows)

sipcdr=# explain analyze select * from october_cdr_call where begin_time 
< '10/15/2005';

   QUERY PLAN

--
 Seq Scan on october_cdr_call  (cost=0.00..273437.39 rows=1639584 
width=568) (ac

tual time=11.623..43681.396 rows=2609215 loops=1)
   Filter: (begin_time < '2005-10-15 00:00:00'::timestamp without time 
zone)

 Total runtime: 54366.944 ms
(3 rows)




Has the table been vacuumed and analyzed?  


Brand new table that I haven't deleted anything from yet.



If so, and if the planner's
row count estimates aren't close to the actual row counts, then you
might benefit from increasing the statistics target for the myTimeStamp
column.


Ok, this is something that balances what might lead to overuse of the 
vacuum command?  I can just look that one up.




How much memory do you have and what's your effective_cache_size
setting?  


1.5 gig RAM, effective_cache_size is the default, so 1000.



That's one of the settings that influences the planner's
decision.  Also, what version of PostgreSQL are you running?


8.0.3



BTW, pgsql-performance would be a more appropriate list to discuss
performance issues.


Ok, I won't cross post this one, but I'll send the next one there.

Here's the final word on this, I think:

sipcdr=# set enable_seqscan=off;
SET
sipcdr=# explain analyze select * from october_cdr_call where begin_time 
< '10/15/2005';


   QUERY PLAN 


---
 Index Scan using october_begin_time on october_cdr_call 
(cost=0.00..6338044.65 rows=1639584 width=568) (actual 
time=51.454..355782.687 rows=2609215 loops=1)
   Index Cond: (begin_time < '2005-10-15 00:00:00'::timestamp without 
time zone)

 Total runtime: 366289.918 ms


Thanks again,

-Dave

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] Index lookup on > and < criteria

2005-11-02 Thread David Durham

Yeah, analyze did make a difference.  See below.


--
Index Scan using october_begin_time on october_cdr_call 
(cost=0.00..98383.82 r

ows=24594 width=568) (actual time=0.280..79274.579 rows=538592 loops=1)
  Index Cond: ((begin_time >= '2005-10-01 00:00:00'::timestamp without 
time zon

e) AND (begin_time < '2005-10-04 00:00:00'::timestamp without time zone))
Total runtime: 81457.938 ms
(3 rows)



The estimated row count (24594) is much different than the actual
row count (538592), which makes me wonder if the statistics are up
to date.  Try running ANALYZE on the table and then see if the
estimate is more accurate.  With a more accurate estimate the planner
might choose a sequential scan, but the other queries you posted
suggest that a sequential scan is indeed faster when you're fetching
this much data.


sipcdr=# analyze october_cdr_call;
ANALYZE

sipcdr=# explain analyze select * from october_cdr_call where begin_time 
>= '10/1/2005' and begin_time < '10/4/2005';


QUERY PLAN 


--
 Seq Scan on october_cdr_call  (cost=0.00..285695.68 rows=500922 
width=371) (actual time=54.510..50004.458 rows=538592 loops=1)
   Filter: ((begin_time >= '2005-10-01 00:00:00'::timestamp without 
time zone) AND (begin_time < '2005-10-04 00:00:00'::timestamp without 
time zone))

 Total runtime: 52335.126 ms


-Dave

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq