On Mon, 12 Nov 2018 at 17:35, Edmund Horner <ejr...@gmail.com> wrote: > Hi, here's the new patch(s). > > Mostly the same, but trying to address your comments from earlier as > well as clean up a few other things I noticed.
Thanks for making those changes. I've now had a look over the latest patches and I've found a few more things. Many of these are a bit nitpicky, but certainly not all. I also reviewed 0004 this time. 0001: 1. The row estimates are not quite right. This cases the row estimation to go the wrong way for isgt. For example, the following gets 24 rows instead of 26. postgres=# create table t (a int); CREATE TABLE postgres=# insert into t select generate_Series(1,100); INSERT 0 100 postgres=# analyze t; postgres=# explain analyze select * from t where ctid >= '(0,75)'; QUERY PLAN --------------------------------------------------------------------------------------------- Seq Scan on t (cost=0.00..2.25 rows=24 width=4) (actual time=0.046..0.051 rows=26 loops=1) Filter: (ctid >= '(0,75)'::tid) Rows Removed by Filter: 74 Planning Time: 0.065 ms Execution Time: 0.074 ms (5 rows) The < and <= case is not quite right either. < should have 1 fewer tuple than the calculated average tuples per page, and <= should have the same (assuming no gaps) I've attached a small delta patch that I think improves things here. 0002: 2. You should test for a non-empty List with list != NIL /* * If no quals were specified, then a complete scan is assumed. Make a * TidExpr with an empty list of TidOpExprs. */ if (!node->tidquals) Also, can you not just return after that if test? I think the code would be easier to read with it like that. 3. I'd rather see EnsureTidRangeSpace() keep doubling the size of the allocation until it reaches the required size. See how MakeSharedInvalidMessagesArray() does it. Doing it this way ensures we always have a power of two sized array which is much nicer if we ever reach the palloc() limit as if the array is sized at the palloc() limit / 2 + 1, then if we try to double it'll fail. Of course, it's unlikely to be a problem here, but... the question would be how to decide on the initial size. 4. "at" needs shifted left a couple of words /* * If the lower bound was already or above at the maximum block * number, then there is no valid range. */ but I don't see how it could be "or above". The ctid type does not have the room for that. Although, that's not to say you should test if (block == MaxBlockNumber), the >= seems better for the code. I'm just complaining about the comment. 5. TidInArrayExprEval() lacks a header comment, and any other comments to mention what it does. The function args also push over the 80 char line length. There's also a few other functions in nodeTidscan.c that are missing a header comment. 6. In MergeTidRanges(), you have: ItemPointerData a_last = a->last; ItemPointerData b_last; if (!ItemPointerIsValid(&a_last)) a_last = a->first; but I don't see anywhere you're setting ->last to an invalid item pointer. Is this left over from a previous design of the range scan? It looks like in TidExprEval() you're setting the upperbound to the last page on the relation. 7. "fist" -> "first" * If the tuple is in the fist block of the range and before the first 8. tss_TidRangePtr is a pretty confusingly named field. if (node->tss_TidRangePtr >= numRanges || node->tss_TidRangePtr < 0) break; I'd expect anything with ptr in it to be a pointer, but this seems to be an array index. Maybe "idx" is better than "ptr", or take note from nodeAppend.c and have something like "tts_whichRange". UPDATE: I see you've just altered what's there already. Perhaps it's okay to leave it as you have it, but it's still not ideal. 9. This comment seems to indicate that a range can only have one bound, but that does not seem to be the case. * Ranges with only one item -- including one resulting from a * CURRENT-OF qual -- are handled by looking up the item directly. It seems open bounded ranges just have the lowest or highest possible value for a ctid on the open side. Perhaps the comment could be written as: /* * For ranges containing a single tuple, we can simply make an * attempt to fetch the tuple directly. */ 10. In cost_tidscan() I think you should ceil() the following: double pages = selectivity * baserel->pages; Otherwise, you'll end up partially charging a seq_page_cost, which seems pretty invalid since you can't partially read a page. 11. In the comment: /* TODO decide what the costs should be */ I think you can just explain why you're charging 1 random_page_cost and the remainder in seq_page_cost. Or is there something left to do here that I've forgotten about? 12. expected_comparison_operator is a bit long a name: IsTidComparison(OpExpr *node, int varno, Oid expected_comparison_operator) How about just expected_opno? 13. !rlst -> rlst != NIL /* if no range qual was found, look for any other TID qual */ if (!rlst) (Yeah I know there's various cases where it's done incorrectly there already :-( ) 14. This is not great: #define IsTidEqualClause(node, varno) IsTidComparison(node, varno, TIDEqualOperator) #define IsTidLTClause(node, varno) IsTidComparison(node, varno, TIDLessOperator) #define IsTidLEClause(node, varno) IsTidComparison(node, varno, TIDLessEqOperator) #define IsTidGTClause(node, varno) IsTidComparison(node, varno, TIDGreaterOperator) #define IsTidGEClause(node, varno) IsTidComparison(node, varno, TIDGreaterEqOperator) #define IsTidRangeClause(node, varno) (IsTidLTClause(node, varno) || \ IsTidLEClause(node, varno) || \ IsTidGTClause(node, varno) || \ IsTidGEClause(node, varno)) The 4 macros for >, >=, < and <= are only used by IsTidRangeClause() which means IsTidComparison() could get called up to 4 times. Most of the work it does would be redundant in that case. Maybe it's better to rethink that? 15. There's no field named NumTids: * TidRanges evaluated item pointers (array of size NumTids) 0003: 16. I think the following comment needs to be updated: /* start from last page of the scan */ to: /* When scanning the whole relation, start from the last page of the scan */ and drop: /* Scanning the full relation: start just before start block. */ then maybe change: /* Scanning a restricted range: start at end of range. */ to /* Otherwise, if scanning just a subset of the relation, start at the final block in the range */ 0004: 17. Can you make a few changed to build_tidscan_pathkeys(): a. build_index_pathkeys() uses ScanDirectionIsBackward(scandir), can you set the opno based on that rather than doing "direction == ForwardScanDirection" b. varexpr can be an Expr and just be named expr. Please move the declaration and assignment out onto separate lines and wrap the long line. c. wrap long line with the call to build_expression_pathkey(). Get rid of the (Expr *) cast. 18. I'd expect the following not to produce a sort above the Tid Scan. postgres=# set enable_seqscan=0; SET postgres=# explain select * from t inner join t t1 on t.ctid = t1.ctid where t.ctid < '(0,10)' ; QUERY PLAN --------------------------------------------------------------------------------------- Merge Join (cost=10000000008.65..10000000009.28 rows=9 width=8) Merge Cond: (t.ctid = t1.ctid) -> Sort (cost=3.33..3.35 rows=9 width=10) Sort Key: t.ctid -> Tid Scan on t (cost=0.00..3.18 rows=9 width=10) TID Cond: (ctid < '(0,10)'::tid) -> Sort (cost=10000000005.32..10000000005.57 rows=100 width=10) Sort Key: t1.ctid -> Seq Scan on t t1 (cost=10000000000.00..10000000002.00 rows=100 width=10) (9 rows) On looking at why the planner did this, I see it's down to how you've coded create_tidscan_paths(). You're creating a tidpath if there's any quals or any useful pathkeys useful to the query's ORDER BY, but only including the pathkeys if they're useful for the query's ORDER BY. I think it'll be better to include the forward pathkeys in all cases, and just make it a backward Tid Scan if backward keys are useful for the ORDER BY. There's still a problem with this as a Merge Join would need a Sort if there was an ORDER BY ctid DESC for one relation even if the other relation had some valid ctid quals since the 2nd scan would create a forward Tid Scan. Maybe that's not worth worrying about. The only fix I can imagine is to always create a forward and backward Tid Scan path, which is pretty bad as it's two more paths that likely won't get used 99.9% of the time. This also caused me to notice the costs are pretty broken for this: postgres=# explain select * from t order by ctid; QUERY PLAN --------------------------------------------------- Tid Scan on t (cost=0.00..0.00 rows=100 width=10) (1 row) 19. Looks like the ScanDirection's normally get named "scandir": static TidScan *make_tidscan(List *qptlist, List *qpqual, Index scanrelid, List *tidquals, ScanDirection direction); Likewise for the various .h files you've added that as a new field to various structs. Setting back to waiting on author. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
fixes_for_v4_0001.diff
Description: Binary data