hi tom ...

i thought about creating an index on the expression but the problem is that this is hardly feasable. in 8.0 (what i have here) this would block the table and i would run out of disk space as well. this is a 600 gb biest :(

what about the planner approach?
this would solve the problem for some other issues as well. an index might not be flexible enough :(.

        many thanks,

                hans


On Sep 4, 2006, at 4:57 PM, Tom Lane wrote:

Hans-Juergen Schoenig <[EMAIL PROTECTED]> writes:
consider the following:

        SELECT some_timestamp::date FROM very_large_table GROUP BY
some_timestamp::date

my very_large_table is around 1billion entries.
the problem is: the planner has a problem here as it is taking the
(correct) estimates for timestamp. this avoids a HashAggregate
because the dataset seems to large for work_mem.
what the planner cannot know is that the number of days is quite
limited (in my case around 1000 different values).
i wonder how to teach the planner to take the cast into consideration.

Create an index on that expression.

regression=# create table foo(x) as select x * '864 sec'::interval + now()::timestamp from generate_series(1,10000) x;
SELECT
regression=# analyze foo;
ANALYZE
regression=# explain select x::date from foo group by x::date;
                          QUERY PLAN
---------------------------------------------------------------
 HashAggregate  (cost=205.00..330.00 rows=10000 width=8)
   ->  Seq Scan on foo  (cost=0.00..180.00 rows=10000 width=8)
(2 rows)

regression=# create index fooi on foo((x::date));
CREATE INDEX
regression=# analyze foo;
ANALYZE
regression=# explain select x::date from foo group by x::date;
                          QUERY PLAN
---------------------------------------------------------------
 HashAggregate  (cost=205.00..206.26 rows=101 width=8)
   ->  Seq Scan on foo  (cost=0.00..180.00 rows=10000 width=8)
(2 rows)

regression=#

I had to cheat a little bit here: I tried to do this example with a
timestamptz column, and the index creation failed because timestamptz to
date isn't immutable (it depends on TimeZone).  If yours is too, you
could perhaps do something involving AT TIME ZONE to generate an
immutable conversion to date.

It would perhaps make sense to provide a way to cue ANALYZE to compute
stats on expressions that aren't actually being indexed, but I see no
good reason to limit our attention to cast expressions.

                        regards, tom lane


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to