Hi hackers, The start-up cost of bounded (top-N) sorts is sensitive at the small end of N, and the comparison_cost * tuples * LOG2(2.0 * output_tuples) curve doesn't seem to correspond to reality. Here's a contrived example that comes from a benchmark:
create table foo as select generate_series(1, 1000000)::int a, (random() * 1000)::int b; analyze foo; select * from foo order by b limit N; Estimated start-up costs for different N: limit 1 = 19425.00 limit 2 = 24425.00 limit 3 = 27349.81 limit 4 = 29425.00 limit 10 = 36034.64 limit 50 = 47644.28 limit 100 = 52644.28 limit 133 = 54701.41 But the actual time doesn't really change on this random development system: it's around 300ms. I stopped at limit 133 because for this particular query, at 134 a Gather Merge plan kicked in and I got degree 3 parallel sorting and I got my answer just shy of three times faster. The speed up definitely paid for the parallel_setup_cost and only one tuple was sent from each worker to the leader because the limit was pushed down. I want to get my answer three times as fast when I say limit 1 too! But I can't because we seem to think that limit 1 is dramatically cheaper than limit 134, so parallelism isn't worth bothering with. With wider tuples the same effect can be seen, though it gets much more difficult to convince Gather Merge to be selected at all for reasons I haven't looked into yet. Is Gather Merge for top-N sorts a missed opportunity due to underestimation of top-N for small values of N? Or is there some other way to think about this problem? Thoughts? -- Thomas Munro http://www.enterprisedb.com