On 12.12.2010 15:17, Martijn van Oosterhout wrote:
On Sun, Dec 12, 2010 at 03:58:49AM +0100, Tomas Vondra wrote:
Very cool that you're working on this.
+1
Lets talk about one special case - I'll explain how the proposed
solution works, and then I'll explain how to make it more general, what
improvements are possible, what issues are there. Anyway this is by no
means a perfect or complete solution - it's just a starting point.
It looks like you handled most of the issues. Just a few points:
- This is obviously applicable to more than just integers, probably
anything with a b-tree operator class. What you've coded seems rely
on calculations on the values. Have you thought about how it could
work for, for example, strings?
The classic failure case has always been: postcodes and city names.
Strongly correlated, but in a way that the computer can't easily see.
Yeah, and that's actually analogous to the example I used in my
presentation.
The way I think of that problem is that once you know the postcode,
knowing the city name doesn't add any information. The postcode implies
the city name. So the selectivity for "postcode = ? AND city = ?" should
be the selectivity of "postcode = ?" alone. The measurement we need is
"implicativeness": How strongly does column A imply a certain value for
column B. Perhaps that could be measured by counting the number of
distinct values of column B for each value of column A, or something
like that. I don't know what the statisticians call that property, or if
there's some existing theory on how to measure that from a sample.
That's assuming the combination has any matches. It's possible that the
user chooses a postcode and city combination that doesn't exist, but
that's no different from a user doing "city = 'fsdfsdfsd'" on a single
column, returning no matches. We should assume that the combination
makes sense.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers