On Thu, Mar 15, 2012 at 8:48 PM, Alvaro Herrera <alvhe...@commandprompt.com> > What Peter proposed seems to me pretty reasonable, in the sense that it > should be possible to come up with a function that creates some text > representation of whatever is in pg_statistic, and another function to > load that data into the new catalog(s). There's no need to keep > pg_statistic binary-compatible, or even continue to have only > pg_statistic (IIRC Zoltan/Hans-Jurgen patch for cross-column stats adds > a new catalog, pg_statistic2 or similar). If the upgrade target release > has room for more/improved stats, that's fine -- they'll be unused after > loading the stats from the dump. And the old stats can be > reacommodated, if necessary.
I have been reading up on selectivity estimation research for the last few days. I must say that I also think that having a text representation as an intermediate won't create a huge maintenance burden. The basic concepts that are there are pretty solid. Conceptually MCV's and histograms continue to be essential even with the more complex approaches. Trying to maintain binary compatibility is probably a bad idea, as Tom noted with the array selectivity patch - encoding of the information could be better. But given a textual format it won't be too hard to just massage the data to the new format. Making it possible to dump and load stats has the additional bonus of enabling more experimentation with custom stats collectors. One could easily prototype the stats collection with R, scipy, etc. Of course the proof will be in the pudding. Re, the patch, current posted WIP cross-col patch doesn't create a new catalog,. It repurposes the stat slots mechanism to store multiple dimensions. But I'll most likely rewrite it to use a separate catalog because the storage requirements are rather different. I'll post a proposal in the appropriate thread when I have decently clear idea how this should work. One thing that seems clear is that multi-dimensional histograms will want this mechanism even more, optimal histogram construction is NP-hard in the multi-dimensional case and so people will want to try different algorithms, or make different tradeoffs on effort spent on constructing the histogram. Or even build one by hand. Cheers, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers