Dne 13.12.2010 22:50, Josh Berkus napsal(a): > Tomas, > >> (a) find out what statistics do we need to collect and how to use it >> (b) implement a really stupid inefficient solution >> (c) optimize in iterations, i.e. making it faster, consuming less >> space etc. > > I'll suggest again how to decide *which* columns to cross: whichever > columns are combined in composite indexes. In version 2, allow the DBA > to specify combinations. > > In the unlikely event that correlation could be reduced to a single > float number, it would be conceivable for each column to have an array > of correlation stats for every other column where correlation was > non-random; on most tables (i.e. ones with less than 100 columns) we're > not talking about that much storage space. > > The main cost would be the time spent collecting that info ...
I think this is a bit early to discuss this, given the fact that we don't have a working solution yet. But OK, let's discuss these options anyway 1) collecting the automatically for composite indexes I don't think this is wise idea. The first versions definitely won't be very efficient, and collecting the data for each composite index means everyone will be hit by this inefficiency, even if he actually does not need that (e.g. the columns are independent so the current estimates are quite accurate or he's not using those columns very often in the same WHERE clause). Another reason against this is that many DBAs don't actually use composed indexes - they simply create indexes on each column and let the bitmap index scan to work it out. And this would not work for this case. And actually it's not very complicated to allow the DBA to do this, this can be a quite simple PL/pgSQL procedure. 2) collecting correlation for each pair of columns Again, you're effectively forcing everyone to pay the price even though he may not need the feature. Maybe we'll get there one day, but it's not a good idea to do that from the beginning. And the correlation itself has a very limited use in real life, as it's not possible to compute it for character columns and is not very useful in case of some numerical columns (e.g. ZIP codes). regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers