Standard cygwin Postgresql 7.4.1 package on Windows 2000 box. I have a relatively small table (about 1 million records) that contains inventory transaction data with an index on transaction date (date datatype). Since the data was inserted chronologically, the transaction date column is strongly correlated with the physical row ordering. As a result, I created a non-unique index on the date column in order to see how much of a performance boost I could get for queries that filter over a range of dates.
My initial problem was the typical newbie issue of convincing the optimizer to use the index I had built on the transaction date. I had forgotten that Postgresql is very picky about data types when utilizing indexes so it didn't take me long to figure out how to rewrite my query in order to take advantage of the index. However, as a result of my experimentation I've run across some puzzling results where the optimizer returned wildly inaccurate row count estimates depending on how I cast my date interval. Consider the following queries: #1) explain analyze select txn_code, count(*) from inventory_txns where txn_date > '07-FEB-2004' and txn_date <= current_date group by txn_code; HashAggregate (cost=1643.56..1643.56 rows=1 width=8) (actual time=731.000..731.000 rows=11 loops=1) -> Index Scan using idx_inventory_txns_txndate on inventory_txns (cost=0.00..1619.94 rows=4723 width=8) (actual time=76.000..709.000 rows=8425 loops=1) Index Cond: ((txn_date > '2004-02-07'::date) AND (txn_date <= ('now'::text)::date)) Total runtime: 742.000 ms #2) explain analyze select txn_code, count(*) from inventory_txns where txn_date > current_date - 30 and txn_date <= current_date group by txn_code; HashAggregate (cost=1667.17..1667.17 rows=1 width=8) (actual time=44.000..44.000 rows=11 loops=1) -> Index Scan using idx_inventory_txns_txndate on inventory_txns (cost=0.00..1643.55 rows=4723 width=8) (actual time=0.000..27.000 rows=8425 loops=1) Index Cond: ((txn_date > (('now'::text)::date - 30)) AND (txn_date <= ('now'::text)::date)) Total runtime: 44.000 ms For the queries above, the optimizer correctly identified that using the index is a big gain over a sequential scan and the estimated vs actual row counts are within striking distance of one another. Now, lets see what happens when I intentionally use casts that prevent the optimizer from utilizing the index on txn_date: #3) explain analyze select txn_code, count(*) from inventory_txns where txn_date > (current_date - 30)::timestamp and txn_date <= current_date group by txn_code; HashAggregate (cost=66035.52..66035.53 rows=5 width=8) (actual time=9149.000..9149.000 rows=11 loops=1) -> Seq Scan on inventory_txns (cost=0.00..65510.82 rows=104941 width=8) (actual time=56.000..9125.000 rows=8425 loops=1) Filter: (((txn_date)::timestamp without time zone > ((('now'::text)::date - 30))::timestamp without time zone) AND (txn_date <= ('now'::text)::date)) Total runtime: 9149.000 ms #4) explain analyze select txn_code, count(*) from inventory_txns where txn_date > (current_date - 30)::timestamp and txn_date <= current_date::timestamp group by txn_code; HashAggregate (cost=70256.76..70256.76 rows=1 width=8) (actual time=10193.000..10193.000 rows=11 loops=1) -> Seq Scan on inventory_txns (cost=0.00..70233.15 rows=4723 width=8) (actual time=14.000..10178.000 rows=8425 loops=1) Filter: (((txn_date)::timestamp without time zone > ((('now'::text)::date - 30))::timestamp without time zone) AND ((txn_date)::timestamp without time zone <= (('now'::text)::date)::timestamp without time zone)) Total runtime: 10194.000 ms So finally, my question to the group is why did my intentional cast on #3 throw the optimizer row count estimate so far off but not for #4 even though both (correctly) ignored the index on txn_date and opted for a sequential scan instead? The txn_date column has statsistics set to 500 and these results are after a full vacuum and analyze. Regards, Shelby Cain __________________________________ Do you Yahoo!? Yahoo! Search - Find what you’re looking for faster http://search.yahoo.com ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend