On Tue, Aug 17, 2021 at 1:47 PM David Rowley <dgrowle...@gmail.com> wrote:

> On Wed, 18 Aug 2021 at 02:42, Zhihong Yu <z...@yugabyte.com> wrote:
> > Since create_partial_distinct_paths() calls
> create_final_distinct_paths(), I wonder if numDistinctRows can be passed to
> create_final_distinct_paths() so that the latter doesn't need to call
> estimate_num_groups().
>
> That can't be done. The two calls to estimate_num_groups() are passing
> in a different number of input rows.  In
> create_partial_distinct_paths() the number of rows is the number of
> expected input rows from a partial path.  In
> create_final_distinct_paths() when called to complete the final
> distinct step, that's the number of distinct values multiplied by the
> number of workers.
>
> It might be more possible to do something like cache the value of
> distinctExprs, but I just don't feel the need.  If there are partial
> paths in the input_rel then it's most likely that planning time is not
> going to dominate much between planning and execution. Also, if we
> were to calculate the value of distinctExprs in create_distinct_paths
> always, then we might end up calculating it for nothing as
> create_final_distinct_paths() does not always need it. I don't feel
> the need to clutter up the code by doing any lazy calculating of it
> either.
>
> David
>
Hi,
Thanks for your explanation.

The patch is good from my point of view.

Reply via email to