Re: [HACKERS] Correlation in cost_index()

2003-08-20 Thread Manfred Koizar
On Fri, 8 Aug 2003 16:53:48 -0700, Sean Chittenden <[EMAIL PROTECTED]> wrote: >the problem with your patch was >that it picked an index less often than the current code when there >was low correlation. Maybe bit rot? What version did you apply the patch against? Here is a new version for Postgre

Re: [HACKERS] Correlation in cost_index()

2003-08-14 Thread Tom Lane
Sean Chittenden <[EMAIL PROTECTED]> writes: > indexCorrelation is 1.0 for the 1st key in a multi-column index. ... only if it's perfectly correlated. > As things stand, however, if a multi-column key is > used, the indexCorrelation is penalized by the size of the number of > keys found in the mul

Re: [HACKERS] Correlation in cost_index()

2003-08-14 Thread Manfred Koizar
On Fri, 8 Aug 2003 11:06:56 -0700, Sean Chittenden <[EMAIL PROTECTED]> wrote: >[...] it'd seem as though an avg depth of >nodes in index * tuples_fetched * (random_io_cost * indexCorrelation) >would be closer than where we are now... Index depth does not belong here because we walk down the index

Re: [HACKERS] Correlation in cost_index()

2003-08-14 Thread Zeugswetter Andreas SB SD
> In both cases ANALYZE will calculate correlation 1.0 for column X, > and something near zero for column Y. We would like to come out with > index correlation 1.0 for the left-hand case and something much less > (but, perhaps, not zero) for the right-hand case. I don't really see > a way to do

Re: [HACKERS] Correlation in cost_index()

2003-08-14 Thread Sean Chittenden
> > Which suggests to me that line 3964 in > > ./src/backend/utils/adt/selfuncs.c isn't right for multi-column > > indexes, esp for indexes that are clustered. I don't know how to > > address this though... Tom, any hints? > > Yes, we knew that already. Oliver had suggested simply dropping the

Re: [HACKERS] Correlation in cost_index()

2003-08-11 Thread Manfred Koizar
On Fri, 08 Aug 2003 18:25:41 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: > Two examples: [...] One more example: X Y A A a B A C b A B B b C C A c B C C Cor

Re: [HACKERS] Correlation in cost_index()

2003-08-11 Thread Sean Chittenden
> > AFAICS (part of) the real problem is in costsize.c:cost_index() where > > IO_cost is calculated from min_IO_cost, pages_fetched, > > random_page_cost, and indexCorrelation. The current implementation > > uses indexCorrelation^2 to interpolate between min_IO_cost and > > max_IO_cost, which IMHO

Re: [HACKERS] Correlation in cost_index()

2003-08-10 Thread Manfred Koizar
On Fri, 8 Aug 2003 15:10:06 -0700, Sean Chittenden <[EMAIL PROTECTED]> wrote: >> Yes, we knew that already. Oliver had suggested simply dropping the >> division by nKeys, thus pretending that the first-column correlation >> is close enough. That seems to me to be going too far in the other >> dir

Re: [HACKERS] Correlation in cost_index()

2003-08-10 Thread Manfred Koizar
On Thu, 7 Aug 2003 13:44:19 -0700, Sean Chittenden <[EMAIL PROTECTED]> wrote: >> The indexCorrelation^2 algorithm was only a quick hack with no theory >> behind it :-(. I've wanted to find some better method to put in there, >> but have not had any time to research the problem. > >Could we "quick

Re: [HACKERS] Correlation in cost_index()

2003-08-10 Thread Sean Chittenden
> >[...] it'd seem as though an avg depth of > >nodes in index * tuples_fetched * (random_io_cost * indexCorrelation) > >would be closer than where we are now... > > Index depth does not belong here because we walk down the index only > once per index scan not once per tuple. It might be part of

Re: [HACKERS] Correlation in cost_index()

2003-08-09 Thread Sean Chittenden
> > Hrm, after an hour of searching and reading, I think one of the > > better papers on the subject can be found here: > > http://www.cs.ust.hk/faculty/dimitris/PAPERS/TKDE-NNmodels.pdf > > Interesting paper, but I don't see the connection to index order > correlation? Nothing that I found was n

Re: [HACKERS] Correlation in cost_index()

2003-08-08 Thread Sean Chittenden
> ># SHOW effective_cache_size ; > > effective_cache_size > >-- > > 4456 > >(1 row) > > Only 35 MB? Are you testing on such a small machine? Testing on my laptop right now... can't hack on my production DBs the same way I can my laptop. > >The stats are attached && bzip2 com

Re: [HACKERS] Correlation in cost_index()

2003-08-08 Thread Tom Lane
Sean Chittenden <[EMAIL PROTECTED]> writes: > Which suggests to me that line 3964 in > ./src/backend/utils/adt/selfuncs.c isn't right for multi-column > indexes, esp for indexes that are clustered. I don't know how to > address this though... Tom, any hints? Yes, we knew that already. Oliver ha

Re: [HACKERS] Correlation in cost_index()

2003-08-08 Thread Manfred Koizar
On Fri, 8 Aug 2003 16:53:48 -0700, Sean Chittenden <[EMAIL PROTECTED]> wrote: ># SHOW effective_cache_size ; > effective_cache_size >-- > 4456 >(1 row) Only 35 MB? Are you testing on such a small machine? >The stats are attached && bzip2 compressed. Nothing was attached. Di

Re: [HACKERS] Correlation in cost_index()

2003-08-07 Thread Tom Lane
Sean Chittenden <[EMAIL PROTECTED]> writes: > Hrm, after an hour of searching and reading, I think one of the better > papers on the subject can be found here: > http://www.cs.ust.hk/faculty/dimitris/PAPERS/TKDE-NNmodels.pdf Interesting paper, but I don't see the connection to index order correlat

Re: [HACKERS] Correlation in cost_index()

2002-10-04 Thread Manfred Koizar
On Thu, 3 Oct 2002 10:45:08 -0600 (MDT), "scott.marlowe" <[EMAIL PROTECTED]> wrote: > effective cache size is the default (i.e. commented out) The default is 1000, meaning ca. 8 MB, which seems to be way too low. If your server is (almost) exclusively used by Postgres, try setting it to represent

Re: [HACKERS] Correlation in cost_index()

2002-10-04 Thread Manfred Koizar
On Thu, 03 Oct 2002 14:50:00 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: >> indexCorrelation is calculated by dividing the correlation of the >> first index column by the number of index columns. > >Yeah, I concluded later that that was bogus. I've been thinking of >just using the correlation of t

Re: [HACKERS] Correlation in cost_index()

2002-10-03 Thread Tom Lane
Manfred Koizar <[EMAIL PROTECTED]> writes: > Never mind! I just stumbled over those lines in selfuncs.c where > indexCorrelation is calculated by dividing the correlation of the > first index column by the number of index columns. Yeah, I concluded later that that was bogus. I've been thinking

Re: [HACKERS] Correlation in cost_index()

2002-10-03 Thread Manfred Koizar
On Thu, 3 Oct 2002 10:59:54 -0600 (MDT), "scott.marlowe" <[EMAIL PROTECTED]> wrote: >>are multicolunm indices involved in your estimator problems? > >No. Although I use them a fair bit, none of the problems I've encountered >so far have involved them. But I'd be willing to setup some test index

Re: [HACKERS] Correlation in cost_index()

2002-10-03 Thread scott.marlowe
On Thu, 3 Oct 2002, Manfred Koizar wrote: > On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), "scott.marlowe" > <[EMAIL PROTECTED]> wrote: > >I've found that when the planner misses, sometimes it misses > >by HUGE amounts on large tables, > > Scott, > > yet another question: are multicolunm indices inv

Re: [HACKERS] Correlation in cost_index()

2002-10-03 Thread scott.marlowe
On Thu, 3 Oct 2002, Manfred Koizar wrote: > On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), "scott.marlowe" > <[EMAIL PROTECTED]> wrote: > >I'd certainly be willing to do some testing on my own data with them. > > Great! > > >Gotta patch? > > Not yet. > > > I've found that when the planner misses

Re: [HACKERS] Correlation in cost_index()

2002-10-03 Thread Manfred Koizar
On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), "scott.marlowe" <[EMAIL PROTECTED]> wrote: >I've found that when the planner misses, sometimes it misses >by HUGE amounts on large tables, Scott, yet another question: are multicolunm indices involved in your estimator problems? Servus Manfred ---

Re: [HACKERS] Correlation in cost_index()

2002-10-03 Thread Manfred Koizar
On Thu, 03 Oct 2002 12:40:20 +0200, I wrote: >>Gotta patch? > >Yes, see below. Oh, did I mention that inserting some break statements after the switch cases helps a lot? :-( Cavus venter non laborat libenter ... Servus Manfred ---(end of broadcast)-

Re: [HACKERS] Correlation in cost_index()

2002-10-03 Thread Manfred Koizar
On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), "scott.marlowe" <[EMAIL PROTECTED]> wrote: >I'd certainly be willing to do some testing on my own data with them. >Gotta patch? Yes, see below. Disclaimer: Apart from "make; make check" this is completely untested. Use at your own risk. Have fun! Se

Re: [HACKERS] Correlation in cost_index()

2002-10-03 Thread Manfred Koizar
On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), "scott.marlowe" <[EMAIL PROTECTED]> wrote: >I'd certainly be willing to do some testing on my own data with them. Great! >Gotta patch? Not yet. > I've found that when the planner misses, sometimes it misses >by HUGE amounts on large tables, and I ha

Re: [HACKERS] Correlation in cost_index()

2002-10-02 Thread Manfred Koizar
On Wed, 02 Oct 2002 18:48:49 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: >I don't think it's really a good idea to expect users to pick among >multiple cost functions The idea is that PG is shipped with a default representing the best of our knowledge and users are not encouraged to change it. Wh

Re: [HACKERS] Correlation in cost_index()

2002-10-02 Thread Tom Lane
Manfred Koizar <[EMAIL PROTECTED]> writes: > AFAICS (part of) the real problem is in costsize.c:cost_index() where > IO_cost is calculated from min_IO_cost, pages_fetched, > random_page_cost, and indexCorrelation. The current implementation > uses indexCorrelation^2 to interpolate between min_IO_

Re: [HACKERS] Correlation in cost_index()

2002-10-02 Thread scott.marlowe
On Wed, 2 Oct 2002, Manfred Koizar wrote: > As nobody knows how each of these proposals performs in real life > under different conditions, I suggest to leave the current > implementation in, add all three algorithms, and supply a GUC variable > to select a cost function. I'd certainly be willin

[HACKERS] Correlation in cost_index()

2002-10-02 Thread Manfred Koizar
You all know this FAQ: "Why does Postgres not use my index?" Half of the time this problem can easily be solved by casting a literal to the type of the respective column; this is not my topic here. In many other cases it turns out that the planner over-estimates the cost of an index scan. Some