Hello.

At Wed, 13 Mar 2019 02:25:40 +0100, Tomas Vondra <tomas.von...@2ndquadrant.com> 
wrote in <19f76496-dcf3-ccea-dd82-26fbed57b...@2ndquadrant.com>
> Hi,
> 
> attached is an updated version of the patch, addressing most of the
> issues raised in the recent reviews. There are two main exceptions:
> 
> 1) I haven't reworked the regression tests to use a function to check
> cardinality estimates and making them faster.
> 
> 2) Review handling of bitmap in statext_is_compatible_clause_internal
> when processing AND/OR/NOT clauses.
> 
> I plan to look into those items next, but I don't want block review of
> other parts of the patch unnecessarily.

I briefly looked it and have some comments.

0001-multivariate-MCV-lists-20190312.patch

+/*
+ * bms_member_index
+ *             determine 0-based index of the varattno in the bitmap
+ *
+ * Returns (-1) when the value is not a member.

I think the comment should be more generic.

"determine 0-based index of member x among the bitmap members"
" Returns -1 when x is not a member."


(cont'ed)
+       if (a == NULL)
+               return 0;

Isn't the case of "not a member"?

bms_member_index seems working differently than maybe expected.

 bms_member_index((2, 4), 0) => 0, (I think) should be -1
 bms_member_index((2, 4), 1) => 0, should be -1
 bms_member_index((2, 4), 2) => 0, should be 0
 bms_member_index((2, 4), 3) => 1, should be -1
 bms_member_index((2, 4), 4) => 1, should be 1
 bms_member_index((2, 4), 5) => 2, should be -1
 bms_member_index((2, 4), 6) => 2, should be -1
...
 bms_member_index((2, 4), 63) => 2, should be -1
 bms_member_index((2, 4), 64) => -1, correct

It works correctly only when x is a member - the way the function
is maybe actually used in this patch -, or needs to change the
specifiction (or the comment) of the function.




+       if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
+       {
+               /*
+                * Estimate selectivity on any clauses applicable by stats 
tracking
+                * actual values first, then apply functional dependencies on 
the
+                * remaining clauses.

The comment doesn't seem needed since it is mentioning the detail
of statext_clauselist_selectivity() called just below.



+               if (statext_is_kind_built(htup, STATS_EXT_MCV))
+               {
+                       StatisticExtInfo *info = makeNode(StatisticExtInfo);
+
+                       info->statOid = statOid;
+                       info->rel = rel;
+                       info->kind = STATS_EXT_MCV;
+                       info->keys = bms_copy(keys);
+
+                       stainfos = lcons(info, stainfos);
+               }


We are to have four kinds of extended statistics, at worst we
have a list containing four StatisticExtInfos with the same
statOid, rel, keys and only different kind. Couldn't we reverse
the structure so that StatisticExtIbfo be something like:

>  struct StatsticExtInfo
>  {
>     NodeTag   type;
>     Oid        statOid;
>     RelOptInfo *rel;
!     char       kind[8];  /* arbitrary.. */
>     Bitmapset *keys;     



+OBJS = extended_stats.o dependencies.o mcv.o mvdistinct.o

The module for MV distinctness is named 'mvdistinct', but mcv
doesn't have the prefix. I'm not sure we need to unify the
names, though.


+Multivariate MCV (most-common values) lists are a straightforward extension of

  "lists are *a*" is wrong?



@@ -223,26 +220,16 @@ dependency_degree(int numrows, HeapTuple *rows, int k, 
AttrNumber *dependency,

I haven't read it in datil, but why MV-MCV patch contains (maybe)
improvement of functional dependency code?



+int
+compare_scalars_simple(const void *a, const void *b, void *arg)

Seems to need a comment. "compare_scalars without tupnoLink maininance"?


+int
+compare_datums_simple(Datum a, Datum b, SortSupport ssup)
+{
+       return ApplySortComparator(a, false, b, false, ssup);
+}

This wrapper function doesn't seem to me required.



+/* simple counterpart to qsort_arg */
+void *
+bsearch_arg(const void *key, const void *base, size_t nmemb, size_t size,

We have some functions named *_bsearch. If it is really
qsort_arg's bsearch versoin, it might be better to be placed in
qsort_arg.c or new file bsearch_arg.c?


+int *
+build_attnums_array(Bitmapset *attrs)

If the attrs is not offset, I'd like that it is named
differently, say, attrs_nooffset or something.


+       int                     i,
+                               j,
+                               len;

I'm not sure but is it following our coding convention?


+                       items[i].values[j] = heap_getattr(rows[i],

items is needed by qsort_arg and as return value. It seems to me
that using just values[] and isnull[] make the code simpler there.


+       /* Look inside any binary-compatible relabeling (as in 
examine_variable) */
+       if (IsA(clause, RelabelType))
+               clause = (Node *) ((RelabelType *) clause)->arg;

This is quite a common locution so it's enough that the comment
just mention what it does, like "Remove any relabel
decorations". And relabelling can happen recursively so the 'if'
should be 'while'?


+               /* we also better ensure the Var is from the current level */
+               if (var->varlevelsup > 0)
+                       return false;


I don't get the meaning of the "better". If it cannot/don't
accept subquery's output, it would be "we refuse Vars from ...",
or if the function is not assumed to receive such Vars, it should
be an assertion.




+               /* see if it actually has the right shape (one Var, one Const) 
*/
+               ok = (NumRelids((Node *) expr) == 1) &&
+                       (is_pseudo_constant_clause(lsecond(expr->args)) ||
+                        (varonleft = false,
+                         is_pseudo_constant_clause(linitial(expr->args))));

I don't think such "expression" with unidentifiable side-effect
is a good thing. Counldn't it in more plain code? (Yeah, it is
already used in clauselist_selectivity so I don't insist on
that.)


+                * This uses the function for estimating selectivity, not the 
operator
+                * directly (a bit awkward, but well ...).

Not only it is the right thing but actually the operators for the
type path don't have operrst.



+ * statext_is_compatible_clause
+ *             Determines if the clause is compatible with MCV lists.

I think the name should contain the word "mcv". Isn't the name
better to be "staext_clause_is_mcv_compatibe"?


(Sorry, further comments may come later..)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Reply via email to