I can reproduce this problem with the query below. explain (costs on) select * from tenk1 order by twenty; QUERY PLAN --------------------------------------------------------------------------------- Gather Merge (cost=772.11..830.93 rows=5882 width=244) Workers Planned: 1 -> Sort (cost=772.10..786.80 rows=5882 width=244) Sort Key: twenty -> Parallel Seq Scan on tenk1 (cost=0.00..403.82 rows=5882 width=244) (5 rows)
On Tue, Jul 16, 2024 at 3:56 PM Anthonin Bonnefoy <anthonin.bonne...@datadoghq.com> wrote: > The initial goal was to use the source tuples if available and avoid > possible rounding errors. Though I realise that the difference would > be minimal. For example, 200K tuples and 3 workers would yield > int(int(200000 / 2.4) * 2.4)=199999. That is probably not worth the > additional complexity, I've updated the patch to just use > gather_rows_estimate. I wonder if the changes in create_ordered_paths should also be reduced to 'total_groups = gather_rows_estimate(path);'. > I've also realised from the comments in optimizer.h that > nodes/pathnodes.h should not be included there and fixed it. I think perhaps it's better to declare gather_rows_estimate() in cost.h rather than optimizer.h. (BTW, I wonder if compute_gather_rows() would be a better name?) I noticed another issue in generate_useful_gather_paths() -- *rowsp would have a random value if override_rows is true and we use incremental sort for gather merge. I think we should fix this too. Thanks Richard