On Fri, Dec 15, 2017 at 10:05 AM, Jeff Janes <jeff.ja...@gmail.com> wrote: > On Tue, Dec 12, 2017 at 10:46 PM, Thomas Munro > <thomas.mu...@enterprisedb.com> wrote: >> >> Hi hackers, >> >> The start-up cost of bounded (top-N) sorts is sensitive at the small >> end of N, and the >> comparison_cost * tuples * LOG2(2.0 * output_tuples) curve doesn't >> seem to correspond to reality. Here's a contrived example that comes >> from a benchmark: >> >> create table foo as >> select generate_series(1, 1000000)::int a, (random() * 1000)::int b; >> analyze foo; >> select * from foo order by b limit N; > > This looks like a costing bug. The estimated cost of sorting 416,667 > estimated tuples in one parallel worker is almost identical to the estimated > cost of sorting 1,000,000 tuples when parallelism is turned off. Adding > some logging to the cost_sort function, it looks like the limit does not get > sent down for the parallel estimate: > > NOTICE: JJ cost_sort tuples 1000000.000000, limit 61.000000, sort_mem 65536 > NOTICE: JJ cost_sort tuples 416667.000000, limit -1.000000, sort_mem 65536 > > So the LIMIT gets passed down for the execution step, but not for the > planning step.
Oh, well spotted. I was looking in the wrong place. Maybe the fix is as simple as the attached? It certainly helps in the cases I tested, including with wider tables. -- Thomas Munro http://www.enterprisedb.com
gather-merge-limit.patch
Description: Binary data