Re: Possible incorrect row estimation for Gather paths

Richard Guo Tue, 16 Jul 2024 18:59:35 -0700

I can reproduce this problem with the query below.

explain (costs on) select * from tenk1 order by twenty;
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Gather Merge  (cost=772.11..830.93 rows=5882 width=244)
   Workers Planned: 1
   ->  Sort  (cost=772.10..786.80 rows=5882 width=244)
         Sort Key: twenty
         ->  Parallel Seq Scan on tenk1  (cost=0.00..403.82 rows=5882 width=244)
(5 rows)

On Tue, Jul 16, 2024 at 3:56 PM Anthonin Bonnefoy
<anthonin.bonne...@datadoghq.com> wrote:
> The initial goal was to use the source tuples if available and avoid
> possible rounding errors. Though I realise that the difference would
> be minimal. For example, 200K tuples and 3 workers would yield
> int(int(200000 / 2.4) * 2.4)=199999. That is probably not worth the
> additional complexity, I've updated the patch to just use
> gather_rows_estimate.

I wonder if the changes in create_ordered_paths should also be reduced
to 'total_groups = gather_rows_estimate(path);'.

> I've also realised from the comments in optimizer.h that
> nodes/pathnodes.h should not be included there and fixed it.

I think perhaps it's better to declare gather_rows_estimate() in
cost.h rather than optimizer.h.
(BTW, I wonder if compute_gather_rows() would be a better name?)

I noticed another issue in generate_useful_gather_paths() -- *rowsp
would have a random value if override_rows is true and we use
incremental sort for gather merge.  I think we should fix this too.

Thanks
Richard

Re: Possible incorrect row estimation for Gather paths

Reply via email to