>Recent discussions on the threads "Double sorting split patch" and
>"CUDA sorting" raised the possibility that there could be significant >performance optimisation "low-hanging fruit" picked by having the >executor treat integers and floats as a special case during sorting, >avoiding going to the trouble of calling a comparator using the >built-in SQL function machinery Why only for integers and floats why not for char/varchar? But I believe this can make code less maintainable as similar things can be done at other places to avoid SQL function machinery. >Once the cache has been warmed, explain analyze very consistently >reports a runtime of 123ms for this query on master/HEAD, which varies >+/- 1 ms, with a few outliers of maybe +/- 2ms. However, when I apply >this patch, that goes down to 107ms +/- 1ms at -O0. Time 123ms which is without your change is with which optimization -O2 or O0? **************************************************************************** *********** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -----Original Message----- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Peter Geoghegan Sent: Tuesday, September 20, 2011 7:26 AM To: PG Hackers Subject: [HACKERS] Inlining comparators as a performance optimisation Recent discussions on the threads "Double sorting split patch" and "CUDA sorting" raised the possibility that there could be significant performance optimisation "low-hanging fruit" picked by having the executor treat integers and floats as a special case during sorting, avoiding going to the trouble of calling a comparator using the built-in SQL function machinery, and taking advantage of inlining of the comparator, which has been shown to have a considerable performance advantage (at least compared to a general purpose c stdlib qsort(), that takes a function pointer as its comparator, much like tuplesort). I've hacked together a sloppy POC implementation in a hurry (basically, some code is shifted around) , which is attached - I felt that it would be useful to determine if the community feels that this is a worth-while undertaking in advance of a business trip that I'm leaving on tomorrow lasting until Friday, during which I will be mostly unavailable. The patch breaks the Postgres sorting executor node (at least when it quicksorts) for any type other than int4. I apologise for how rough the patch is, but the code itself isn't important right now - the idea is. I anticipate that the value state->datumType or something similar will be set in a near future revision, so that tuplesort_performsort will know which type-specific optimisation it can use for the type, while falling back on the existing generic qsort_arg + qsort_arg_comparator, and sorting won't actually be broken. I've been doing some preliminary testing using the dell store 2 sample database. I increase work_mem to '50MB', to ensure that a quicksort will be performed for sorting (otherwise, I'm using the postgresql.conf that initdb gave me). The query is: explain analyze select * from orderlines order by prod_id; Once the cache has been warmed, explain analyze very consistently reports a runtime of 123ms for this query on master/HEAD, which varies +/- 1 ms, with a few outliers of maybe +/- 2ms. However, when I apply this patch, that goes down to 107ms +/- 1ms at -O0. I think that that's a pretty good start. Funnily enough, the difference/advantage vanishes at -O2 (I'm guessing that the higher optimisation level of GCC 4.5 hyper-corrects away the inlining, but I don't have time to check that right now). I imagine the version that I actually submit for patch review will have a macro-based infrastructure for inlining the sorting of various built-in types, initially integers and floats. It will most likely have some other optimisations - I haven't even used a profiler yet. This performance patch differs from most in that it's difficult in principle to imagine a performance regression occurring. Thoughts? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services