On Fri, 8 Aug 2003 16:53:48 -0700, Sean Chittenden
<[EMAIL PROTECTED]> wrote:
>the problem with your patch was
>that it picked an index less often than the current code when there
>was low correlation.
Maybe bit rot? What version did you apply the patch against? Here is
a new version for Postgre
Sean Chittenden <[EMAIL PROTECTED]> writes:
> indexCorrelation is 1.0 for the 1st key in a multi-column index.
... only if it's perfectly correlated.
> As things stand, however, if a multi-column key is
> used, the indexCorrelation is penalized by the size of the number of
> keys found in the mul
On Fri, 8 Aug 2003 11:06:56 -0700, Sean Chittenden
<[EMAIL PROTECTED]> wrote:
>[...] it'd seem as though an avg depth of
>nodes in index * tuples_fetched * (random_io_cost * indexCorrelation)
>would be closer than where we are now...
Index depth does not belong here because we walk down the index
> In both cases ANALYZE will calculate correlation 1.0 for column X,
> and something near zero for column Y. We would like to come out with
> index correlation 1.0 for the left-hand case and something much less
> (but, perhaps, not zero) for the right-hand case. I don't really see
> a way to do
> > Which suggests to me that line 3964 in
> > ./src/backend/utils/adt/selfuncs.c isn't right for multi-column
> > indexes, esp for indexes that are clustered. I don't know how to
> > address this though... Tom, any hints?
>
> Yes, we knew that already. Oliver had suggested simply dropping the
On Fri, 08 Aug 2003 18:25:41 -0400, Tom Lane <[EMAIL PROTECTED]>
wrote:
> Two examples: [...]
One more example:
X Y
A A
a B
A C
b A
B B
b C
C A
c B
C C
Cor
> > AFAICS (part of) the real problem is in costsize.c:cost_index() where
> > IO_cost is calculated from min_IO_cost, pages_fetched,
> > random_page_cost, and indexCorrelation. The current implementation
> > uses indexCorrelation^2 to interpolate between min_IO_cost and
> > max_IO_cost, which IMHO
On Fri, 8 Aug 2003 15:10:06 -0700, Sean Chittenden
<[EMAIL PROTECTED]> wrote:
>> Yes, we knew that already. Oliver had suggested simply dropping the
>> division by nKeys, thus pretending that the first-column correlation
>> is close enough. That seems to me to be going too far in the other
>> dir
On Thu, 7 Aug 2003 13:44:19 -0700, Sean Chittenden
<[EMAIL PROTECTED]> wrote:
>> The indexCorrelation^2 algorithm was only a quick hack with no theory
>> behind it :-(. I've wanted to find some better method to put in there,
>> but have not had any time to research the problem.
>
>Could we "quick
> >[...] it'd seem as though an avg depth of
> >nodes in index * tuples_fetched * (random_io_cost * indexCorrelation)
> >would be closer than where we are now...
>
> Index depth does not belong here because we walk down the index only
> once per index scan not once per tuple. It might be part of
> > Hrm, after an hour of searching and reading, I think one of the
> > better papers on the subject can be found here:
> > http://www.cs.ust.hk/faculty/dimitris/PAPERS/TKDE-NNmodels.pdf
>
> Interesting paper, but I don't see the connection to index order
> correlation?
Nothing that I found was n
> ># SHOW effective_cache_size ;
> > effective_cache_size
> >--
> > 4456
> >(1 row)
>
> Only 35 MB? Are you testing on such a small machine?
Testing on my laptop right now... can't hack on my production DBs the
same way I can my laptop.
> >The stats are attached && bzip2 com
Sean Chittenden <[EMAIL PROTECTED]> writes:
> Which suggests to me that line 3964 in
> ./src/backend/utils/adt/selfuncs.c isn't right for multi-column
> indexes, esp for indexes that are clustered. I don't know how to
> address this though... Tom, any hints?
Yes, we knew that already. Oliver ha
On Fri, 8 Aug 2003 16:53:48 -0700, Sean Chittenden
<[EMAIL PROTECTED]> wrote:
># SHOW effective_cache_size ;
> effective_cache_size
>--
> 4456
>(1 row)
Only 35 MB? Are you testing on such a small machine?
>The stats are attached && bzip2 compressed.
Nothing was attached. Di
Sean Chittenden <[EMAIL PROTECTED]> writes:
> Hrm, after an hour of searching and reading, I think one of the better
> papers on the subject can be found here:
> http://www.cs.ust.hk/faculty/dimitris/PAPERS/TKDE-NNmodels.pdf
Interesting paper, but I don't see the connection to index order
correlat
On Thu, 3 Oct 2002 10:45:08 -0600 (MDT), "scott.marlowe"
<[EMAIL PROTECTED]> wrote:
> effective cache size is the default (i.e. commented out)
The default is 1000, meaning ca. 8 MB, which seems to be way too low.
If your server is (almost) exclusively used by Postgres, try setting
it to represent
On Thu, 03 Oct 2002 14:50:00 -0400, Tom Lane <[EMAIL PROTECTED]>
wrote:
>> indexCorrelation is calculated by dividing the correlation of the
>> first index column by the number of index columns.
>
>Yeah, I concluded later that that was bogus. I've been thinking of
>just using the correlation of t
Manfred Koizar <[EMAIL PROTECTED]> writes:
> Never mind! I just stumbled over those lines in selfuncs.c where
> indexCorrelation is calculated by dividing the correlation of the
> first index column by the number of index columns.
Yeah, I concluded later that that was bogus. I've been thinking
On Thu, 3 Oct 2002 10:59:54 -0600 (MDT), "scott.marlowe"
<[EMAIL PROTECTED]> wrote:
>>are multicolunm indices involved in your estimator problems?
>
>No. Although I use them a fair bit, none of the problems I've encountered
>so far have involved them. But I'd be willing to setup some test index
On Thu, 3 Oct 2002, Manfred Koizar wrote:
> On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), "scott.marlowe"
> <[EMAIL PROTECTED]> wrote:
> >I've found that when the planner misses, sometimes it misses
> >by HUGE amounts on large tables,
>
> Scott,
>
> yet another question: are multicolunm indices inv
On Thu, 3 Oct 2002, Manfred Koizar wrote:
> On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), "scott.marlowe"
> <[EMAIL PROTECTED]> wrote:
> >I'd certainly be willing to do some testing on my own data with them.
>
> Great!
>
> >Gotta patch?
>
> Not yet.
>
> > I've found that when the planner misses
On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), "scott.marlowe"
<[EMAIL PROTECTED]> wrote:
>I've found that when the planner misses, sometimes it misses
>by HUGE amounts on large tables,
Scott,
yet another question: are multicolunm indices involved in your
estimator problems?
Servus
Manfred
---
On Thu, 03 Oct 2002 12:40:20 +0200, I wrote:
>>Gotta patch?
>
>Yes, see below.
Oh, did I mention that inserting some break statements after the
switch cases helps a lot? :-(
Cavus venter non laborat libenter ...
Servus
Manfred
---(end of broadcast)-
On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), "scott.marlowe"
<[EMAIL PROTECTED]> wrote:
>I'd certainly be willing to do some testing on my own data with them.
>Gotta patch?
Yes, see below. Disclaimer: Apart from "make; make check" this is
completely untested. Use at your own risk. Have fun!
Se
On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), "scott.marlowe"
<[EMAIL PROTECTED]> wrote:
>I'd certainly be willing to do some testing on my own data with them.
Great!
>Gotta patch?
Not yet.
> I've found that when the planner misses, sometimes it misses
>by HUGE amounts on large tables, and I ha
On Wed, 02 Oct 2002 18:48:49 -0400, Tom Lane <[EMAIL PROTECTED]>
wrote:
>I don't think it's really a good idea to expect users to pick among
>multiple cost functions
The idea is that PG is shipped with a default representing the best of
our knowledge and users are not encouraged to change it. Wh
Manfred Koizar <[EMAIL PROTECTED]> writes:
> AFAICS (part of) the real problem is in costsize.c:cost_index() where
> IO_cost is calculated from min_IO_cost, pages_fetched,
> random_page_cost, and indexCorrelation. The current implementation
> uses indexCorrelation^2 to interpolate between min_IO_
On Wed, 2 Oct 2002, Manfred Koizar wrote:
> As nobody knows how each of these proposals performs in real life
> under different conditions, I suggest to leave the current
> implementation in, add all three algorithms, and supply a GUC variable
> to select a cost function.
I'd certainly be willin
You all know this FAQ: "Why does Postgres not use my index?" Half of
the time this problem can easily be solved by casting a literal to the
type of the respective column; this is not my topic here.
In many other cases it turns out that the planner over-estimates the
cost of an index scan. Some
29 matches
Mail list logo