On Mon, Apr 20, 2020 at 01:47:29AM +0200, Tomas Vondra wrote:
On Sat, Apr 18, 2020 at 02:23:25PM -0400, James Coleman wrote:
On Thu, Apr 16, 2020 at 9:26 PM Tom Lane <t...@sss.pgh.pa.us> wrote:

Tomas Vondra <tomas.von...@2ndquadrant.com> writes:
I think we have essentially three options:
1) assuming there's just a single group
2) assuming each row is a separate group
3) something in between
If (1) and (2) are worst/best-case scenarios, maybe we should pick
something in between. We have DEFAULT_NUM_DISTINCT (200) which
essentially says "we don't know what the number of groups is" so maybe
we should use that.

I wouldn't recommend picking either the best or worst cases.

Possibly DEFAULT_NUM_DISTINCT is a sane choice, though it's fair to
wonder if it's quite applicable to the case where we already know
we've grouped by some columns.

Do you think defining a new default, say,
DEFAULT_NUM_DISTINCT_PRESORTED is preferred then? And choose some
value like "1/2 of the normal DEFAULT_NUM_DISTINCT groups" or some
such?


If we had a better intuition what a better value is, maybe. But I don't
think we have that at all, so I'd just use the existing one.


I've pushed fix with the DEFAULT_NUM_DISTINCT. The input comes from a
set operation (which is where we call generate_append_tlist), so it's
probably fairly unique, so maybe we should use input_tuples. But it's
not guaranteed, so DEFAULT_NUM_DISTINCT seems reasonably defensive.

One detail I've changed is that instead of matching the expression
directly to a Var, it now calls pull_varnos() to also detect Vars
somewhere deeper. Lookig at examine_variable() it calls find_base_rel
for such case too, but I haven't tried constructing a query triggering
the issue.

One improvement I can think of is handling lists with only some
expressions containing varno 0. We could still call estimate_num_groups
for expressions with varno != 0, and multiply that by the estimate for
the other part (be it DEFAULT_NUM_DISTINCT). This might produce a higher
estimate than just using DEFAULT_NUM_DISTINCT directly, resulting in a
lower incremenal sort cost. But it's not clear to me if this can even
happen - AFAICS either all Vars have varno 0 or none, so I haven't done
this.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply via email to