On 8 February 2012 15:17, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Feb 8, 2012 at 9:51 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> IMO this patch is already well past the point of diminishing returns in >> value-per-byte-added. I'd like to see it trimmed back to provide a fast >> path for just single-column int4/int8/float4/float8 sorts. The other >> cases aren't going to offer enough of a win to justify the code space. > > I'm curious about how much we're gaining from the single-column > specializations vs. the type-specific specializations. I think I'm > going to go try to characterize that.
I think it might make more sense to lose the type-specific specialisations for the multi-key case while adding a generic multi-key specialisation, than to lose all multi-key specialisations, though I have not considered that question at length, and would think that we'd still want to keep an int4 version in that case. Note that I *did not* include a generic multi-key specialisation, though only because I saw little point, having already covered by far the most common cases. While you're at it, I'd like to suggest that you perform a benchmark on a multi-key specialisation, so we can see just what we're throwing away before we do so. Better to have those numbers come from you. I continue to maintain that the most appropriate course of action is to provisionally commit all specialisations. If it's hard to know what effect this is going to have on real workloads, let's defer to beta testers, who presumably try the new release out with their application. It's a question you could squarely put to them, without gradually rolling back from that initial position being much of a problem. The mysql-server package is 45 MB on Fedora 16. That 1% of Postgres binary figure is for my earlier patch with btree specialisations, right? I'm not asking you to look at that right now. I also don't think that "where do we eventually draw the line with specialisations like this in Postgres generally?" is a question that you should expect me to answer, though I will say that we should look at each case on its merits. I have not "totally denied" binary bloat costs. I have attempted to quantify them, while acknowledging that such a task is difficult, as was evident from the fact that Robert "wasn't suprised" that I could not demonstrate any regression. Granted, my definition of a regression is that there is very clearly no net loss in performance at some reasonable granularity, which is a very practical definition. You can quite easily contrive a case that HOT handles really badly. Some people did, I believe, but HOT won out because it was clearly very useful in the real world. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers