On Thu, Apr 23, 2020 at 6:59 AM Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:
> I've pushed fix with the DEFAULT_NUM_DISTINCT. The input comes from a > set operation (which is where we call generate_append_tlist), so it's > probably fairly unique, so maybe we should use input_tuples. But it's > not guaranteed, so DEFAULT_NUM_DISTINCT seems reasonably defensive. > Thanks for the fix. Verified that the crash has been fixed. > > One detail I've changed is that instead of matching the expression > directly to a Var, it now calls pull_varnos() to also detect Vars > somewhere deeper. Lookig at examine_variable() it calls find_base_rel > for such case too, but I haven't tried constructing a query triggering > the issue. > A minor comment is that I don't think we need to strip relabel explicitly before calling pull_varnos(), because this function would recurse into T_RelabelType nodes. Also do we need to call bms_free(varnos) for each pathkey here to avoid waste of memory? > > One improvement I can think of is handling lists with only some > expressions containing varno 0. We could still call estimate_num_groups > for expressions with varno != 0, and multiply that by the estimate for > the other part (be it DEFAULT_NUM_DISTINCT). This might produce a higher > estimate than just using DEFAULT_NUM_DISTINCT directly, resulting in a > lower incremenal sort cost. But it's not clear to me if this can even > happen - AFAICS either all Vars have varno 0 or none, so I haven't done > this. > I don't think this case would happen either. Thanks Richard