On Thu, Dec 02, 2004 at 22:37:38 -0800,
  Vincent Hikida <[EMAIL PROTECTED]> wrote:
> 2004-12-01  2003-02-22
> 2005-03-04  2003-02-22          (a)
> 2005-03-05  2004-12-15          (b)
> 2005-03-05  2004-06-18          (c)
> 2007-04-12  2005-06-18          (d)
> 
> Let's say that there are a million entries where the toDate is less than 
> today 2004-12-02. That is less than (a) in the index. From the index then 
> only a, b, c, and d should be scanned further. a and c would be picked 

That is correct, but that part relies only on the part of the index dependent
on toDate.

> based on the index values because 2004-12-02 is between the from and end 
> date. However, b and d would be excluded immediately because the the from 
> date is greater than 2004-12-02 and would save the optimizer from even 
> reading the table for these index entries because the fromDate is in the 
> index.

That is not correct. Postgres currently doesn't have a way to skip ahead
on an index scan. So what will happen is that a, b, c, d and will all be
considered and b and d removed by a filter rule.

> This may be a somewhat extreme example but my experience is in most systems 
> old historical data makes up the bulk of the data and newer data is a much 
> smaller amount. In addition most people are interested in data from the 
> most recent month.

The idea of having a toDate index is good, it is just that having a multicolumn
index doesn't help for this problem. In fact by making the index wider, it
will slow things down.
 
> Of course I may be mistaken about the data distribution.

The distribution of values is what makes toDate or FromDate a better index
(if any) to use. You may very well be correct that for most people toDate
will more likely be the better index to use.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Reply via email to