On Sun, Sep 25, 2011 at 10:12 PM, Peter Geoghegan <pe...@2ndquadrant.com> wrote: > [ new results ]
Nice results. I think these are far more convincing than the last set, because (1) the gains are bigger and (2) they survive -O2 and (3) you tested an actual query, not just qsort() itself. I don't want to take time to review this in detail right now, because I don't think it would be fair to put this ahead of things that were submitted for the current CommitFest, but I'm impressed. > On the subject of highly ambitious optimisations to sorting, one > possibility I consider much more practicable than GPU-accelerated > sorting is simple threading; quicksort can be parallelised very > effectively, due to its divide-and-conquer nature. If we could agree > on a threading abstraction more sophisticated than forking, it's > something I'd be interested in looking at. To do so would obviously > entail lots of discussion about how that relates to whatever way we > eventually decide on implementing parallel query, and that's obviously > a difficult discussion. I have the same feeling about about this that I do about almost every executor optimization that anyone proposes: the whole project would be entirely simple and relatively painless if it weren't for the need to make planner changes. I mean, deciding on a threading interface is likely to be a somewhat contentious discussion, with differences of opinion on whether we should do it and what the API should look like. But at the end of the day it's not rocket science, and I expect that we would end up with something reasonable. What seems much harder is figuring out how to decide when to perform quicksort in parallel vs. single-threaded, and how much parallelism would be appropriate. I haven't seen anyone propose even a shadow of an idea about how to make such decisions intelligently, either in general or in specific cases. The other issue is that, while threading would be possibly suitable for this particular case, at least for built-in datatypes with comparison operations that basically reduce to single machine-language comparison instructions, it's hard to see how we could take it much further. It would be unsafe for these multiple threads of execution to do anything that could possibly throw an error or anything that touches a lightweight lock or, really, just about anything at all. Trying to make the entire backend thread-safe - or even any significant portion of it - seems like a colossal effort that will most likely fail, but maybe not without eating an enormous amount of developer time first. And without doing that, I don't think we could even extend this as far as, say, numeric, whose functions do things like palloc() and ereport() internally. So I feel like this whole approach might be a dead-end - there's a narrow range of cases where it could be made to work, I think, but after that I think you hit a titanium wall. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers