Re: [HACKERS] new correlation metric

2008-11-03 Thread Brendan Jurd
On Tue, Nov 4, 2008 at 4:21 AM, Jeff Davis <[EMAIL PROTECTED]> wrote: > We don't want to hold anything up, so feel free to move on to another > patch. If you still have time to review when we have a better patch, > we'd appreciate your feedback even if it's too late for 8.4. > No worries, thanks J

Re: [HACKERS] new correlation metric

2008-11-03 Thread Jeff Davis
On Mon, 2008-11-03 at 18:33 +1100, Brendan Jurd wrote: > If I'm grokking the thread, it looks like Tom suggested a substantial > change in the approach (targetting per-index correlation rather than > per-column) [1], and although you agreed with the spirit of his > suggestion[2], there hasn't been

Re: [HACKERS] new correlation metric

2008-11-02 Thread Brendan Jurd
Hi Jeff, I've been assigned to do an initial review of your "new correlation metric" patch. If I'm grokking the thread, it looks like Tom suggested a substantial change in the approach (targetting per-index correlation rather than per-column) [1], and although you agreed with the spirit of his su

Re: [HACKERS] new correlation metric

2008-10-27 Thread Tom Lane
Ron Mayer <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> By definition, a bitmap scan's cost isn't affected by index order >> correlation. > No? I think I understand that for index scans the correlation > influenced how many data pages are estimated to get sucked in. No, it's not about that,

Re: [HACKERS] new correlation metric

2008-10-27 Thread Ron Mayer
Tom Lane wrote: Ron Mayer <[EMAIL PROTECTED]> writes: ...bitmap cost estimates didn't also change much By definition, a bitmap scan's cost isn't affected by index order correlation. No? I think I understand that for index scans the correlation influenced how many data pages are estimat

Re: [HACKERS] new correlation metric

2008-10-27 Thread Tom Lane
Ron Mayer <[EMAIL PROTECTED]> writes: > ... I was somewhat surprised that the bitmap cost estimates didn't > also change much. Wouldn't the estimated # of data blocks > read for the bitmap be roughly the same as for the index? By definition, a bitmap scan's cost isn't affected by index order corr

Re: [HACKERS] new correlation metric

2008-10-27 Thread Ron Mayer
Jeff Davis wrote: Currently, we use correlation to estimate the I/O costs of an index scan. However, this has some problems: It certainly helps some cases. Without the patch, the little test script below ends up picking the third fastest plan (a seq-scan) instead of a faster bitmapscan, or an

Re: [HACKERS] new correlation metric

2008-10-26 Thread Tom Lane
Jeff Davis <[EMAIL PROTECTED]> writes: > On Sun, 2008-10-26 at 12:44 -0400, Tom Lane wrote: >> We might need to invent some >> other catalog besides pg_statistic if we want to represent per-index >> properties like correlation. > Why can't we just use pg_statistic with the starelid set to the inde

Re: [HACKERS] new correlation metric

2008-10-26 Thread Jeff Davis
On Sun, 2008-10-26 at 12:44 -0400, Tom Lane wrote: > I wonder whether we ought to rethink the problem entirely. [...] What you say makes a lot of sense. We would have to take a sample of index leaf pages, but I think we could get a really useful number from it. For BTree, we can just read the val

Re: [HACKERS] new correlation metric

2008-10-26 Thread Tom Lane
Martijn van Oosterhout <[EMAIL PROTECTED]> writes: > I think the code is in the right direction, but I think want you want > is some kind of estimate of "given I've looked for tuple X, how many > tuples in the next k pages are near this one". Unfortunatly I don't see > a way of calculating it other

Re: [HACKERS] new correlation metric

2008-10-26 Thread Greg Stark
I haven't look at the patch yet -- I'm actually on a train now. I'm sorry if these questions are answered in the patch. I think there are three questions here: A) what metric are we going to use B) how do we estimate thy metric given a sample C) how do we draw the conclusions we need based on t

Re: [HACKERS] new correlation metric

2008-10-26 Thread Heikki Linnakangas
Martijn van Oosterhout wrote: On Sun, Oct 26, 2008 at 01:38:02AM -0700, Jeff Davis wrote: I worked with Nathan Boley to come up with what we think is a better metric for measuring this cost. It is based on the number of times in the ordered sample that you have to physically backtrack (i.e. the

Re: [HACKERS] new correlation metric

2008-10-26 Thread Tom Lane
Martijn van Oosterhout <[EMAIL PROTECTED]> writes: > On Sun, Oct 26, 2008 at 01:38:02AM -0700, Jeff Davis wrote: >> I worked with Nathan Boley to come up with what we think is a better >> metric for measuring this cost. > I think the code is in the right direction, but I think want you want > is s

Re: [HACKERS] new correlation metric

2008-10-26 Thread Martijn van Oosterhout
On Sun, Oct 26, 2008 at 01:38:02AM -0700, Jeff Davis wrote: > I worked with Nathan Boley to come up with what we think is a better > metric for measuring this cost. It is based on the number of times in > the ordered sample that you have to physically backtrack (i.e. the data > value increases, but