On Thu, Apr 23, 2020 at 6:59 AM Tomas Vondra <tomas.von...@2ndquadrant.com>
wrote:

> I've pushed fix with the DEFAULT_NUM_DISTINCT. The input comes from a
> set operation (which is where we call generate_append_tlist), so it's
> probably fairly unique, so maybe we should use input_tuples. But it's
> not guaranteed, so DEFAULT_NUM_DISTINCT seems reasonably defensive.
>

Thanks for the fix. Verified that the crash has been fixed.


>
> One detail I've changed is that instead of matching the expression
> directly to a Var, it now calls pull_varnos() to also detect Vars
> somewhere deeper. Lookig at examine_variable() it calls find_base_rel
> for such case too, but I haven't tried constructing a query triggering
> the issue.
>

A minor comment is that I don't think we need to strip relabel
explicitly before calling pull_varnos(), because this function would
recurse into T_RelabelType nodes.

Also do we need to call bms_free(varnos) for each pathkey here to avoid
waste of memory?


>
> One improvement I can think of is handling lists with only some
> expressions containing varno 0. We could still call estimate_num_groups
> for expressions with varno != 0, and multiply that by the estimate for
> the other part (be it DEFAULT_NUM_DISTINCT). This might produce a higher
> estimate than just using DEFAULT_NUM_DISTINCT directly, resulting in a
> lower incremenal sort cost. But it's not clear to me if this can even
> happen - AFAICS either all Vars have varno 0 or none, so I haven't done
> this.
>

I don't think this case would happen either.

Thanks
Richard

Reply via email to