On Thu, Jun 5, 2014 at 5:37 PM, Peter Geoghegan <p...@heroku.com> wrote: > One thing that isn't all that obvious about this worst case is that > it's in general very qsort() friendly, and therefore the startup costs > (copying) totally dominates. Actually, you're not even sorting - > you're verifying that the tuples are already exactly in order (a > questionable optimization we apply at every level).
Kevin mentioned something about the Wisconsin courts having columns that all began with "The State of Wisconsin Vs." in the dev meeting in Ottawa. I thought that this was an interesting case, because it is representative of reality, which is crucially important to consider here. I decided to simulate it. In my original test database: postgres=# create table wisconsin(casen text); CREATE TABLE postgres=# insert into wisconsin select 'The State of Wisconsin Vs. ' || city from cities; INSERT 0 317102 sort-wisconsin.sql: select * from (select * from wisconsin order by casen offset 1000000) d; Master: pgbench -M prepared -f sort-wisconsin.sql -T 300 -n transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 1 number of threads: 1 duration: 300 s number of transactions actually processed: 55 latency average: 5454.545 ms tps = 0.181191 (including connections establishing) tps = 0.181191 (excluding connections establishing) Patch (most recent revision, with ameliorations, HyperLogLog, etc): pgbench -M prepared -f sort-wisconsin.sql -T 300 -n transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 1 number of threads: 1 duration: 300 s number of transactions actually processed: 55 latency average: 5454.545 ms tps = 0.182593 (including connections establishing) tps = 0.182594 (excluding connections establishing) Earlier patch (no ameliorations for Heikki's case): pgbench -M prepared -f sort-wisconsin.sql -T 300 -n transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 1 number of threads: 1 duration: 300 s number of transactions actually processed: 54 latency average: 5555.556 ms tps = 0.176914 (including connections establishing) tps = 0.176915 (excluding connections establishing) With my most recent revision, the ameliorating measures are effective enough that with the sortsupport shim and fmgr trampoline avoided, we still come out ahead even for this case. Great. But you may be surprised that the regression is so small in the case of the patch without any ameliorating measures (the original patch). That's because the data isn't *perfectly* logically/physically correlated here, as in Heikki's worst case. So, the 317,102 wasted strxfrm() calls are relatively inexpensive. Consider how cost_sort() models the cost of a sort when an in memory quicksort is anticipated: /* We'll use plain quicksort on all the input tuples */ startup_cost += comparison_cost * tuples * LOG2(tuples); In the case of this quicksort, the planner guesses there'll be "317102 * LOG2(317102)" comparisons -- about 5,794,908 comparisons, which implies over 10 times as many strcoll() calls as wasted strxfrm() calls. The cost of those strxfrm() calls begins to look insignificant before n gets too big (at n = 100, it's 100 wasted strxfrm() calls to about 664 strcoll() calls). Unless, of course, you have a "bubble sort best case" where everything is already completely in order, in which case there'll be a 1:1 ratio between wasted strxfrm() calls and strcoll() calls. This optimization was something that we added to our qsort(). It doesn't appear in the original NetBSD implementation, and it doesn't appear in the Bentley/McIlroy paper, and it doesn't appear anywhere else that I'm aware of. I'm not the only person to regard it with suspicion - Tom has in the past expressed doubts about that too [1]. Also, note that no sorting algorithm can do better than O(n log n) in the average case - that's the information-theoretical lower bound on the average-case speed of any comparison-based sorting algorithm. To be clear: I'm certainly not saying that we shouldn't fix Heikki's worst case, and indeed I believe I have, but we should also put this worst case in perspective. By the way, I just realized that I failed to fully remove client overhead (I should have put an extra 0 on the end of my offset for the city.sql query), which added noise to the "city"/"country"/"province" tests. Revised figures are as follows (these are better than before): Master: ====== pgbench -M prepared -f sort-city.sql -T 300 -n transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 1 number of threads: 1 duration: 300 s number of transactions actually processed: 278 latency average: 1079.137 ms tps = 0.924358 (including connections establishing) tps = 0.924362 (excluding connections establishing) Patch: ===== pgbench -M prepared -f sort-city.sql -T 300 -n transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 1 number of threads: 1 duration: 300 s number of transactions actually processed: 1046 latency average: 286.807 ms tps = 3.486089 (including connections establishing) tps = 3.486104 (excluding connections establishing) Master: ====== pgbench -M prepared -j 4 -c 4 -f sort-city.sql -T 300 -n transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 4 number of threads: 4 duration: 300 s number of transactions actually processed: 387 latency average: 3100.775 ms tps = 1.278062 (including connections establishing) tps = 1.278076 (excluding connections establishing) Patch: ===== pgbench -M prepared -j 4 -c 4 -f sort-city.sql -T 300 -n transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 4 number of threads: 4 duration: 300 s number of transactions actually processed: 1670 latency average: 718.563 ms tps = 5.557700 (including connections establishing) tps = 5.557754 (excluding connections establishing) BTW, if you want to measure any of this independently, I suggest making sure that power management settings don't ruin things. I advise setting CPU governor to "performance" for each core on Linux, for example. [1] http://www.postgresql.org/message-id/18033.1361789...@sss.pgh.pa.us -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers