On Mon, Aug 30, 2021 at 9:00 AM Tomas Vondra <tomas.von...@enterprisedb.com>
wrote:

> On 8/28/21 6:30 PM, Mark Dilger wrote:
> >
> >
> >> On Aug 28, 2021, at 6:52 AM, Tomas Vondra
> >> <tomas.von...@enterprisedb.com> wrote:
> >>
> >> Part 0003 fixes handling of those clauses so that we don't treat
> >> them as simple, but it does that by tweaking
> >> statext_is_compatible_clause(), as suggested by Dean.
> >
> > Function examine_opclause_args() doesn't set issimple to anything in
> > the IsA(rightop, Const) case, but assigns *issimplep = issimple at
> > the bottom.  The compiler is not complaining about using a possibly
> > uninitialized variable, but if I change the "return true" on the very
> > next line to "return issimple", the compiler complains quite loudly.
> >
>
> Yeah, true. Thanks for noticing this was a bug - I forgot to set the
> issimple variable in the first branch.
>
> >
> > Some functions define bool *issimple, others bool *issimplep and bool
> > issimple.  You might want to standardize the naming.
> >
>
> I think the naming is standard with respect to the surrounding code. If
> the other parameters use "p" to mark "pointer" then issimplep is used,
> but in other places it's just "issimple". IMHO this is appropriate.
>
> > It's difficult to know what "simple" means in extended_stats.c.
> > There is no file-global comment explaining the concept, and functions
> > like compare_scalars_simple don't have correlates named
> > compare_scalars_complex or such, so the reader cannot infer by
> > comparison what the difference might be between a "simple" case and
> > some non-"simple" case.  The functions' issimple (or issimplep)
> > argument are undocumented.
> >
> > There is a comment:
> >
> > /* * statext_mcv_clauselist_selectivity *      Estimate clauses using
> > the best multi-column statistics. .... * * - simple selectivity:
> > Computed without extended statistics, i.e. as if the *
> > columns/clauses were independent. * .... */
> >
> > but it takes a while to find if you search for "issimple".
> >
>
> Yeah, true. This was added a while ago when Dean reworked the estimation
> (based on MCV), and it seemed clear back then. But now a comment
> explaining this concept (and how it affects the estimation) would be
> helpful. I'll try digging in the archives for the details.
>
> >
> > In both scalarineqsel_wrapper() and eqsel_internal(), the call to
> > matching_restriction_variables() should usually return false, since
> > comparing a variable to itself is an unusual case.  The next call is
> > to get_restriction_variable(), which repeats the work of examining
> > the left and right variables.  So in almost all cases, after throwing
> > away the results of:
> >
> > examine_variable(root, left, varRelid, &ldata);
> > examine_variable(root, right, varRelid, &rdata);
> >
> > performed in matching_restriction_variables(), we'll do exactly the
> > same work again (with one variable named differently) in
> > get_restriction_variable():
> >
> > examine_variable(root, left, varRelid, vardata);
> > examine_variable(root, right, varRelid, &rdata);
> >
> > That'd be fine if example_variable() were a cheap function, but it
> > appears not to be.  Do you think you could save the results rather
> > than recomputing them?  It's a little messy, since these are the only
> > two functions out of about ten which follow this pattern, so you'd
> > have to pass NULLs into get_restriction_variable() from the other
> > eight callers, but it still looks like that would be a win.
> >
>
> I had similar concerns, although I don't think those functions are very
> expensive compared to the rest of the estimation code. I haven't done
> any measurements yet, though.
>
> But I don't think saving the results is the way to go - in a way, we
> already store the stats (which seems like the most expensive bit) in
> syscache. It seems better to just simplify examine_variable() so that it
> does not lookup the statistics, which we don't need here at all.
>
>
> The attached version of the patches fixes the other bugs reported here
> so far - most importantly it reworks how we set issimple while examining
> the clauses, so that it's never skips the initialization. Hopefully the
> added comments also explain it a bit more clearly.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
Hi,
For patch 0002,

+                   s1 = statext_clauselist_selectivity(root, clauses,
varRelid,
+                                                       jointype, sjinfo,
rel,
+                                                       &estimatedclauses,
false);
+
+                   estimated = (bms_num_members(estimatedclauses) == 1);

I took a look at clauselist_apply_dependencies() (called by
statext_clauselist_selectivity) where estimatedclauses is modified.
Since the caller would not use the returned Selectivity if number of
elements in estimatedclauses is greater than 1, I wonder
if a parameter can be added to clauselist_apply_dependencies() which
indicates early return if the second element is added to estimatedclauses.

Cheers

Reply via email to