On Tue, Aug 17, 2021 at 1:47 PM David Rowley <dgrowle...@gmail.com> wrote:
> On Wed, 18 Aug 2021 at 02:42, Zhihong Yu <z...@yugabyte.com> wrote: > > Since create_partial_distinct_paths() calls > create_final_distinct_paths(), I wonder if numDistinctRows can be passed to > create_final_distinct_paths() so that the latter doesn't need to call > estimate_num_groups(). > > That can't be done. The two calls to estimate_num_groups() are passing > in a different number of input rows. In > create_partial_distinct_paths() the number of rows is the number of > expected input rows from a partial path. In > create_final_distinct_paths() when called to complete the final > distinct step, that's the number of distinct values multiplied by the > number of workers. > > It might be more possible to do something like cache the value of > distinctExprs, but I just don't feel the need. If there are partial > paths in the input_rel then it's most likely that planning time is not > going to dominate much between planning and execution. Also, if we > were to calculate the value of distinctExprs in create_distinct_paths > always, then we might end up calculating it for nothing as > create_final_distinct_paths() does not always need it. I don't feel > the need to clutter up the code by doing any lazy calculating of it > either. > > David > Hi, Thanks for your explanation. The patch is good from my point of view.