On Mon, Aug 30, 2021 at 9:00 AM Tomas Vondra <tomas.von...@enterprisedb.com> wrote:
> On 8/28/21 6:30 PM, Mark Dilger wrote: > > > > > >> On Aug 28, 2021, at 6:52 AM, Tomas Vondra > >> <tomas.von...@enterprisedb.com> wrote: > >> > >> Part 0003 fixes handling of those clauses so that we don't treat > >> them as simple, but it does that by tweaking > >> statext_is_compatible_clause(), as suggested by Dean. > > > > Function examine_opclause_args() doesn't set issimple to anything in > > the IsA(rightop, Const) case, but assigns *issimplep = issimple at > > the bottom. The compiler is not complaining about using a possibly > > uninitialized variable, but if I change the "return true" on the very > > next line to "return issimple", the compiler complains quite loudly. > > > > Yeah, true. Thanks for noticing this was a bug - I forgot to set the > issimple variable in the first branch. > > > > > Some functions define bool *issimple, others bool *issimplep and bool > > issimple. You might want to standardize the naming. > > > > I think the naming is standard with respect to the surrounding code. If > the other parameters use "p" to mark "pointer" then issimplep is used, > but in other places it's just "issimple". IMHO this is appropriate. > > > It's difficult to know what "simple" means in extended_stats.c. > > There is no file-global comment explaining the concept, and functions > > like compare_scalars_simple don't have correlates named > > compare_scalars_complex or such, so the reader cannot infer by > > comparison what the difference might be between a "simple" case and > > some non-"simple" case. The functions' issimple (or issimplep) > > argument are undocumented. > > > > There is a comment: > > > > /* * statext_mcv_clauselist_selectivity * Estimate clauses using > > the best multi-column statistics. .... * * - simple selectivity: > > Computed without extended statistics, i.e. as if the * > > columns/clauses were independent. * .... */ > > > > but it takes a while to find if you search for "issimple". > > > > Yeah, true. This was added a while ago when Dean reworked the estimation > (based on MCV), and it seemed clear back then. But now a comment > explaining this concept (and how it affects the estimation) would be > helpful. I'll try digging in the archives for the details. > > > > > In both scalarineqsel_wrapper() and eqsel_internal(), the call to > > matching_restriction_variables() should usually return false, since > > comparing a variable to itself is an unusual case. The next call is > > to get_restriction_variable(), which repeats the work of examining > > the left and right variables. So in almost all cases, after throwing > > away the results of: > > > > examine_variable(root, left, varRelid, &ldata); > > examine_variable(root, right, varRelid, &rdata); > > > > performed in matching_restriction_variables(), we'll do exactly the > > same work again (with one variable named differently) in > > get_restriction_variable(): > > > > examine_variable(root, left, varRelid, vardata); > > examine_variable(root, right, varRelid, &rdata); > > > > That'd be fine if example_variable() were a cheap function, but it > > appears not to be. Do you think you could save the results rather > > than recomputing them? It's a little messy, since these are the only > > two functions out of about ten which follow this pattern, so you'd > > have to pass NULLs into get_restriction_variable() from the other > > eight callers, but it still looks like that would be a win. > > > > I had similar concerns, although I don't think those functions are very > expensive compared to the rest of the estimation code. I haven't done > any measurements yet, though. > > But I don't think saving the results is the way to go - in a way, we > already store the stats (which seems like the most expensive bit) in > syscache. It seems better to just simplify examine_variable() so that it > does not lookup the statistics, which we don't need here at all. > > > The attached version of the patches fixes the other bugs reported here > so far - most importantly it reworks how we set issimple while examining > the clauses, so that it's never skips the initialization. Hopefully the > added comments also explain it a bit more clearly. > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > Hi, For patch 0002, + s1 = statext_clauselist_selectivity(root, clauses, varRelid, + jointype, sjinfo, rel, + &estimatedclauses, false); + + estimated = (bms_num_members(estimatedclauses) == 1); I took a look at clauselist_apply_dependencies() (called by statext_clauselist_selectivity) where estimatedclauses is modified. Since the caller would not use the returned Selectivity if number of elements in estimatedclauses is greater than 1, I wonder if a parameter can be added to clauselist_apply_dependencies() which indicates early return if the second element is added to estimatedclauses. Cheers