Re: Tid scan improvements

Edmund Horner Sun, 11 Nov 2018 20:41:29 -0800

Hi, here's the new patch(s).

Mostly the same, but trying to address your comments from earlier as
well as clean up a few other things I noticed.


Cheers,
Edmund

On Fri, 9 Nov 2018 at 15:01, Edmund Horner <ejr...@gmail.com> wrote:
>
> On Tue, 6 Nov 2018 at 16:40, David Rowley <david.row...@2ndquadrant.com> 
> wrote:
> > I've been looking over 0001 to 0003. I ran out of steam before 0004.
>
> Hi David, thanks for another big review with lots of improvements.
>
> > I like the design of the new patch. From what I threw so far at the
> > selectivity estimation code, it seems pretty good.  I also quite like
> > the design in nodeTidscan.c for range scans.
>
>
> > I didn't quite manage to wrap my head around the code that removes
> > redundant quals from the tidquals. For example, with:
> >
> > postgres=# explain select * from t1 where ctid <= '(0,10)' and a = 0;
> >                     QUERY PLAN
> > --------------------------------------------------
> >  Tid Scan on t1  (cost=0.00..3.19 rows=1 width=4)
> >    TID Cond: (ctid <= '(0,10)'::tid)
> >    Filter: (a = 0)
> > (3 rows)
> >
> > and:
> >
> > postgres=# explain select * from t1 where ctid <= '(0,10)' or a = 20
> > and ctid >= '(0,0)';
> >                                   QUERY PLAN
> > ------------------------------------------------------------------------------
> >  Tid Scan on t1  (cost=0.01..176.18 rows=12 width=4)
> >    TID Cond: ((ctid <= '(0,10)'::tid) OR (ctid >= '(0,0)'::tid))
> >    Filter: ((ctid <= '(0,10)'::tid) OR ((a = 20) AND (ctid >= 
> > '(0,0)'::tid)))
> > (3 rows)
> >
> > I understand why the 2nd query didn't remove the ctid quals from the
> > filter, and I understand why the first query could. I just didn't
> > manage to convince myself that the code behaves correctly for all
> > cases.
>
> I agree it's not obvious.
>
> 1. We extract a set of tidquals that can be directly implemented by
> the Tid scan.  This set is of the form:  "(CTID op ? AND ...) OR
> (...)" (with some limitations).
> 2. If they happened to come verbatim from the original RestrictInfos,
> then they will be found in scan_clauses, and we can remove them.
> 3. If they're not verbatim, i.e. the original RestrictInfos have
> additional criteria that the Tid scan can't use, then tidquals won't
> match anything in scan_clauses, and hence scan_clauses will be
> unchanged.
> 4. We do a bit of extra work for the common and useful case of "(CTID
> op ? AND ...)".  Since the top-level operation of the input quals is
> an AND, it will typically be split into multiple RestrictInfo items.
> We remove each part from scan_clauses.
>
> > 1. I see a few instances of:
> >
> > #define DatumGetItemPointer(X) ((ItemPointer) DatumGetPointer(X))
> > #define ItemPointerGetDatum(X) PointerGetDatum(X)
> >
> > in both tid.c and ginfuncs.c, and I see you have:
> >
> > + itemptr = (ItemPointer) DatumGetPointer(constval);
> >
> > Do you think it would be worth moving the macros out of tid.c and
> > ginfuncs.c into postgres.h and use that macro instead?
> >
> > (I see the code in this file already did this, so it might not matter
> > about this)
>
> I'm not sure about this one - - I think it's better as a separate
> patch, since we'd also change ginfuncs.c.  I have left it alone for
> now.
>
> > 2. In TidCompoundRangeQualFromExpr() rlst is not really needed. You
> > can just return MakeTidRangeQuals(found_quals); or return NIL.
>
> Yup, gone.
>
> > 3. Can you explain why this only needs to take place when list_length() == 
> > 1?
> >
> > /*
> > * In the case of a compound qual such as "ctid > ? AND ctid < ? AND ...",
> > * the various parts will have come from different RestrictInfos.  So
> > * remove each part separately.
> > */
> > ...
>
> I've tried to improve the comment.
>
> > 4. Accidental change?
> >
> > - tidquals);
> > + tidquals
> > + );
> >
> > 5. Shouldn't this comment get changed?
> >
> > - * NumTids    number of tids in this scan
> > + * NumRanges    number of tids in this scan
> >
> > 6. There's no longer a field named NumTids
> >
> > - * TidList    evaluated item pointers (array of size NumTids)
> > + * TidRanges    evaluated item pointers (array of size NumTids)
> >
> > 7. The following field is not documented in TidScanState:
> >
> > + bool tss_inScan;
> >
> > 8. Can you name this exprtype instead?
> >
> > + TidExprType type; /* type of op */
> >
> > "type" is used by Node types to indicate their type.
>
> Yup, yup, yup, yup, yup.
>
> > 9. It would be neater this:
> >
> > if (expr->opno == TIDLessOperator || expr->opno == TIDLessEqOperator)
> > tidopexpr->type = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
> > else if (expr->opno == TIDGreaterOperator || expr->opno == 
> > TIDGreaterEqOperator)
> > tidopexpr->type = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
> > else
> > tidopexpr->type = TIDEXPR_EQ;
> >
> > tidopexpr->exprstate = exprstate;
> >
> > tidopexpr->inclusive = expr->opno == TIDLessEqOperator || expr->opno
> > == TIDGreaterEqOperator;
> >
> > as a switch: ...
>
> Yup, I think the switch is a bit nicer.
>
> > 10. I don't quite understand this comment:
> >
> > + * Create an ExprState corresponding to the value part of a TID comparison,
> > + * and wrap it in a TidOpExpr.  Set the type and inclusivity of the 
> > TidOpExpr
> > + * appropriately, depending on the operator and position of the its 
> > arguments.
> >
> > I don't quite see how the code sets the inclusivity depending on the
> > position of the arguments.
> >
> > Maybe the comment should be:
> >
> > + * For the given 'expr' build and return an appropriate TidOpExpr taking 
> > into
> > + * account the expr's operator and operand order.
>
> I'll go with your wording.
>
> > 11. ScalarArrayOpExpr are commonly named "saop": ...
>
> Yup.
>
> > 12. You need to code SetTidLowerBound() with similar wraparound logic
> > you have in SetTidUpperBound().
> >
> > It's perhaps unlikely, but the following shows incorrect results.
> >
> > postgres=# select ctid from t1 where ctid > '(0,65535)' limit 1;
> >  ctid
> > -------
> >  (0,1)
> > (1 row)
> >
> > -- the following is fine.
> >
> > Time: 1.652 ms
> > postgres=# select ctid from t1 where ctid >= '(0,65535)' limit 1;
> >  ctid
> > -------
> >  (1,1)
> > (1 row)
> >
> > Likely you can just upgrade to the next block when the offset is >
> > MaxOffsetNumber.
>
> This is important, thanks for spotting it.
>
> I've tried to add some code to handle this case (and also that of
> "ctid < '(0,0)'") with a couple of tests too.
>
> > 13. It looks like the previous code didn't make the assumption you're 
> > making in:
> >
> > + * A current-of TidExpr only exists by itself, and we should
> > + * already have allocated a tidList entry for it.  We don't
> > + * need to check whether the tidList array needs to be
> > + * resized.
> >
> > I'm not sure if it's a good idea to lock the executor code into what
> > the grammar currently says is possible. The previous code didn't
> > assume that.
>
> Fair enough, I've restored the previous code without the assumption.
>
> > 14. we pass 'false' to what?
>
> Obsolete comment (see reply to Alvaro).
>
> I've applied most of these, and I'll post a new patch soon.

v4-0001-Add-selectivity-and-nullness-estimates-for-CTID-syst.patch
Description: Binary data

v4-0003-Support-backward-scans-over-restricted-ranges-in-hea.patch
Description: Binary data

v4-0002-Support-range-quals-in-Tid-Scan.patch
Description: Binary data

v4-0004-Tid-Scan-results-are-ordered.patch
Description: Binary data

Re: Tid scan improvements

Reply via email to