Re: [HACKERS] cross column correlation revisted

marcin mank Wed, 14 Jul 2010 14:58:39 -0700

On Wed, Jul 14, 2010 at 5:13 PM, Robert Haas <[email protected]> wrote:
> 2010/7/14 Tom Lane <[email protected]>:
>> If the combination of columns is actually interesting, there might well
>> be an index in place, or the DBA might be willing to create it.
>
> Indexes aren't free, though, nor even close to it.
>
> Still, I think we should figure out the underlying mechanism first and
> then design the interface afterwards.  One idea I had was a way to say
> "compute the MCVs and histogram buckets for this table WHERE
> <predicate>".  If you can prove predicate for a particular query, you
> can use the more refined statistics in place of the full-table
> statistics.  This is fine for the breast cancer case, but not so
> useful for the zip code/street name case (which seems to be the really
> tough one).
>


One way of dealing with the zipcode problem is estimating NDST =
count(distinct row(zipcode, street))  - i.e. multi-column ndistinct.

Then the planner doesn`t have to assume that the selectivity of a
equality condition involving both zipcode and city is a multiple of
the respective selectivities. As a first cut it can assume that it
will get count(*) / NDST rows, but there are ways to improve it.

Greetings
Marcin Mańk

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] cross column correlation revisted

Reply via email to