Re: [HACKERS] cross column correlation revisted

2010-07-15 Thread David Fetter
On Thu, Jul 15, 2010 at 12:04:21PM +0200, Hans-Jürgen Schönig wrote: > hello ... > > a view is already nice but i think it is still too narrow. One sure way to fail is to take on a problem in chunks too large. If we get even one of the cross-column issues solved by statistics, we'll be ahead of

Re: [HACKERS] cross column correlation revisted

2010-07-15 Thread Joshua Tolley
On Thu, Jul 15, 2010 at 12:04:21PM +0200, Hans-Jürgen Schönig wrote: > hello ... > > a view is already nice but i think it is still too narrow. > the problem is: you don't want a view for every potential join. > in addition to that - ideally there is not much left of a view when it comes > to ch

Re: [HACKERS] cross column correlation revisted

2010-07-15 Thread Hans-Jürgen Schönig
hello ... a view is already nice but i think it is still too narrow. the problem is: you don't want a view for every potential join. in addition to that - ideally there is not much left of a view when it comes to checking for costs. so, i think, this is not the kind of approach leading to total

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Dimitri Fontaine
Joshua Tolley writes: >> >> ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id =3D y.id AND x.id= 2 =3D y.id2) >> >=20 >> it says X and Y ... the selectivity of joins are what i am most >> interested in. cross correlation of columns within the same table are >> just a byproduct. the core thing

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread marcin mank
On Wed, Jul 14, 2010 at 5:13 PM, Robert Haas wrote: > 2010/7/14 Tom Lane : >> If the combination of columns is actually interesting, there might well >> be an index in place, or the DBA might be willing to create it. > > Indexes aren't free, though, nor even close to it. > > Still, I think we shou

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Robert Haas
2010/7/14 Tom Lane : > If the combination of columns is actually interesting, there might well > be an index in place, or the DBA might be willing to create it. Indexes aren't free, though, nor even close to it. Still, I think we should figure out the underlying mechanism first and then design th

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Joshua Tolley
On Wed, Jul 14, 2010 at 04:41:01PM +0200, PostgreSQL - Hans-Jürgen Schönig wrote: > hello ... > > look at the syntax i posted in more detail: > > >>ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = > >> y.id2) > > > it says X and Y ... > the selectivity of joins are wha

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig
hello ... look at the syntax i posted in more detail: >> ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = >> y.id2) > it says X and Y ... the selectivity of joins are what i am most interested in. cross correlation of columns within the same table are just a byprod

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Andrew Dunstan
Tom Lane wrote: If the combination of columns is actually interesting, there might well be an index in place, or the DBA might be willing to create it. I'm having a hard time imagining an interesting case where that wouldn't be so. For that matter, have you considered the idea of examin

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Tom Lane
=?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?= writes: > i think that having stats on an index is a problem by itself for 2 reasons - > for cross column correlation at least: > a.) joins cannot be covered by an index on two tables - we would fix > "inside a table correlation probl

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig
hello tom, i think that having stats on an index is a problem by itself for 2 reasons - for cross column correlation at least: a.) joins cannot be covered by an index on two tables - we would fix "inside a table correlation problems" but not joins. b.) who says that there is act

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Joshua Tolley
On Wed, Jul 14, 2010 at 01:21:19PM +0200, Yeb Havinga wrote: > Heikki Linnakangas wrote: >> However, the problem is how to represent and store the >> cross-correlation. For fields with low cardinality, like "gender" and >> boolean "breast-cancer-or-not" you can count the prevalence of all the

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Tom Lane
Heikki Linnakangas writes: > On 14/07/10 13:12, PostgreSQL - Hans-Jürgen Schönig wrote: >> maybe somehow like this ... >> ALTER TABLE x SET CORRELATION STATISTICS FOR (id = id2 AND id3=id4) > +1 is my general feeling, it's good if you can tell the system to > collect additional statistics where

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Yeb Havinga
Heikki Linnakangas wrote: However, the problem is how to represent and store the cross-correlation. For fields with low cardinality, like "gender" and boolean "breast-cancer-or-not" you can count the prevalence of all the different combinations, but that doesn't scale. Another often cited exam

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig
On Jul 14, 2010, at 12:40 PM, Heikki Linnakangas wrote: > On 14/07/10 13:12, PostgreSQL - Hans-Jürgen Schönig wrote: >> hello everybody, >> >> we are currently facing some serious issues with cross correlation issue. >> consider: 10% of all people have breast cancer. we have 2 genders (50:50). >

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Heikki Linnakangas
On 14/07/10 13:12, PostgreSQL - Hans-Jürgen Schönig wrote: hello everybody, we are currently facing some serious issues with cross correlation issue. consider: 10% of all people have breast cancer. we have 2 genders (50:50). if i select all the men with breast cancer, i will get basically nobody

[HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig
hello everybody, we are currently facing some serious issues with cross correlation issue. consider: 10% of all people have breast cancer. we have 2 genders (50:50). if i select all the men with breast cancer, i will get basically nobody - the planner will overestimate the output. this is the com