Re: [PERFORM] best way to fetch next/prev record based on index
Greg Stark <[EMAIL PROTECTED]> writes: > Fixing it to write out complex boolean expressions wouldn't be too hard, but > I'm not clear it would be worth it, since I suspect the end result would be as > the comment indicates, to introduce a new runtime node. Just to prove that it isn't really all that hard, I took a stab at doing this. This is basically only my second attempt to write even trivial bits of server backend code so I certainly don't suggest anyone try using this code. In fact it doesn't quite compile, because I have a bit of confusion between the char *oprname and List *opname variables. Now I could clear that up, but I'm missing one piece of the puzzle. To make it work I do need a way to construct a List *opname from ">" or "=" and I don't know how to do that. I think that's all I'm missing, but perhaps in the morning I'll look at this code and wonder "what was I thinking?!" This approach won't get the optimizer to actually use an index for these comparisons, but it will fix the semantics to match the spec. Later we can either improve the optimizer to detect expressions like this (which I think would be cooler since some users may write them by hand and not use the row-expression approach, but I don't see how to do it), or introduce a new run-time node and have the optimizer handle it. But at least we won't have to worry about backwards-compatibility issues with the semantics changing. Oh, I tried to stick to the style, but sometimes I couldn't help myself. I suppose I would have to fix up the style the rest of the way if I got it working and you wanted a patch to apply. /* * Transform a "row op row" construct */ static Node * make_row_op(ParseState *pstate, List *opname, Node *ltree, Node *rtree) { Node *result = NULL; RowExpr*lrow, *rrow; List *largs, *rargs; char *oprname; /* Inputs are untransformed RowExprs */ lrow = (RowExpr *) transformExpr(pstate, ltree); rrow = (RowExpr *) transformExpr(pstate, rtree); Assert(IsA(lrow, RowExpr)); Assert(IsA(rrow, RowExpr)); largs = lrow->args; rargs = rrow->args; if (list_length(largs) != list_length(rargs)) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("unequal number of entries in row expression"))); oprname = strVal(llast(opname)); if (strcmp(oprname, "=") == 0) { result = make_row_op_simple(pstate, "=", largs, rargs); } else if (strcmp(oprname, "<>") == 0) { result = make_row_op_simple(pstate, "<>", largs, rargs); } else if ((strcmp(oprname, "<") == 0) || (strcmp(oprname, ">") == 0)) { result = make_row_op_complex(pstate, oprname, largs, rargs); } /* alternatively these last two could just create negated < and > * expressions. Which is better depends on whether the extra clause * confuses the optimizer more or less than having to push the NOTs down */ else if (strcmp(oprname, ">=") == 0) { Node *branch = make_row_op_simple(pstate, "=", largs, rargs); result = make_row_op_complex(pstate, ">", largs, rargs); result = (Node *) makeBoolExpr(OR_EXPR, list_make2(result, branch)); } else if (strcmp(oprname, "<=") == 0) { Node *branch = make_row_op_simple(pstate, "=", largs, rargs); result = make_row_op_complex(pstate, "<", largs, rargs); result = (Node *) makeBoolExpr(OR_EXPR, list_make2(result, branch)); } else { ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("operator %s is not supported for row expressions", oprname))); } return result; } /* * Handle something like * (A,B,C) = (X,Y,Z) * By constructing * (A=X) AND (B=Y) AND (C=Z) * */ static Node * make_row_op_simple(ParseState *pstate, char *oprname, List *largs, List *rargs) { ListCell *l, *r; BoolExprType boolop; Node *result; boolop = strcmp(oprname, "<>")==0 ? OR_EXPR : AND_EXPR; forboth(l, largs, r, rargs) { Node*larg = (Node *) lfirst(l); Node*rarg = (Node *) lfirst(r); Node*cmp; cmp = (Node *) make_op(pstate, opname, larg, rarg); cmp = coerce_to_boolean(pstate, cmp, "row comparison"); if (result == NULL) result = cmp; else result = (Node *) makeBoolExpr(boolop, list_make2(result, cmp)); } if (result == NULL) { /* zero-length rows? Generate constant TRUE or FALSE */ if (boolop == AND_EXPR) result = makeBoolConst(true, false); else result = makeBoolConst(false, false); } return result; } /* * Handles something like: * (A,B,C) > (X,Y,Z) * * By constructing something like
Re: [PERFORM] best way to fetch next/prev record based on index
> Greg Stark <[EMAIL PROTECTED]> writes: > This approach won't get the optimizer to actually use an index for these > comparisons, but it will fix the semantics to match the spec. Later we can > either improve the optimizer to detect expressions like this (which I > think > would be cooler since some users may write them by hand and not use the > row-expression approach, but I don't see how to do it), or introduce a new > run-time node and have the optimizer handle it. But at least we won't have > to > worry about backwards-compatibility issues with the semantics changing. > > Oh, I tried to stick to the style, but sometimes I couldn't help myself. I > suppose I would have to fix up the style the rest of the way if I got it > working and you wanted a patch to apply. Regarding the <= and >= operators: can you apply them in the complex pass? If you can, this might be more efficient. > /* > * Handles something like: > * (A,B,C) > (X,Y,Z) > * > * By constructing something like: > * ( ( A > X) OR (A=X AND B>Y) OR (A=X AND B=Y AND C>Z) ) > * ^ > */ | the last comparison of the last major clause (or the only comparison for a single field row construct) is a special case. In > cases use >, in >= cases use >=, etc.; this is logical equivalent to doing or of simple = intersected with complex >. Is this step of the transformation visible to the optimizer/planner? For purely selfish reasons, it would be really nice if a field by field row construction could get a fast path to the index if the fields match the index fields. Merlin ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] best way to fetch next/prev record based on index
Greg Stark <[EMAIL PROTECTED]> writes: > Removing <,<=,>,>= would be trivial. ... and not backwards-compatible. If we did that then cases involving unlabeled row expressions would no longer work as they did in prior releases. For example select (1,2,3) < (4,5,6); is accepted by all releases back to 7.0, and probably much further (but 7.0 is the oldest I have handy to test). The only reason the code in parse_expr.c appears new is that the functionality used to be in gram.y. I'd like to see this fixed to comply with the spec, but given the lack of complaints about the existing behavior over so many years, ripping it out meanwhile doesn't seem appropriate. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] best way to fetch next/prev record based on index
> Greg Stark <[EMAIL PROTECTED]> writes: > > Removing <,<=,>,>= would be trivial. > > ... and not backwards-compatible. If we did that then cases involving > unlabeled row expressions would no longer work as they did in prior > releases. For example > > select (1,2,3) < (4,5,6); > > is accepted by all releases back to 7.0, and probably much further (but > 7.0 is the oldest I have handy to test). The only reason the code in > parse_expr.c appears new is that the functionality used to be in gram.y. > > I'd like to see this fixed to comply with the spec, but given the lack > of complaints about the existing behavior over so many years, ripping > it out meanwhile doesn't seem appropriate. Just to clarify: I think Greg is arguing to bring pg to SQL 92 spec and not remove anything. ISTM the SQL standard was designed with exactly my problem in mind: how to get the next key in a table. IMHO, relying on select (1,2,3) < (4,5,6); to give a result which is neither standard nor documented seems to be bad style. The current methodology could cause pg to give incorrect results in TPC benchmarks...not good. Also, it's trivial to rewrite that comparison with the old behavior using 'and'. OTOH, it is not trivial to rewrite the comparison to do it the correct way...it's kind of an SQL 'trick question'. Most likely, a very small minority of pg users are even away of the above syntax anyways. To be fair, I'm a big fan of deprecating features for at least one release for compatibility reasons. It's no big deal to me, because I'm already writing the queries out the long way anyways. My interests are in the optimizer. If there is a way to enhance it so that it multi-column comparisons in a logical way, that would be great. Is this theoretically possible (probable)? Merlin ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] best way to fetch next/prev record based on index
Tom Lane <[EMAIL PROTECTED]> writes: > The only reason the code in parse_expr.c appears new is that the > functionality used to be in gram.y. Ah, that was what I was missing. Though it's odd since it seems there was code in parse_expr.c to handle the "=" case specially. > I'd like to see this fixed to comply with the spec, but given the lack > of complaints about the existing behavior over so many years, ripping > it out meanwhile doesn't seem appropriate. I tried my hand at this last night and think I did an ok first pass. But I'm missing one piece of the puzzle to get it to compile. What do I need to know to be able to construct a List* suitable for passing as the second arg to make_op() knowing only that I want to create a List* to represent "=" or "<" or so on? I also had another question I didn't ask in the other email. In the midst of a forboth() loop, how would I tell if I'm at the last element of the lists? Would lnext(l)==NULL do it? -- greg ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] best way to fetch next/prev record based on index
Greg Stark <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> writes: >> The only reason the code in parse_expr.c appears new is that the >> functionality used to be in gram.y. > Ah, that was what I was missing. Though it's odd since it seems there was code > in parse_expr.c to handle the "=" case specially. IIRC, the case involving a subselect, eg ... WHERE (1,2) = ANY (SELECT a, b FROM foo) ... has always been handled in parse_expr.c, but cases involving simple rows were previously expanded in gram.y. One of the reasons I moved the logic over to parse_expr.c was the thought that it would be easier to do it right in parse_expr.c --- gram.y would not be allowed to look up related operators, which seems necessary to handle the construct per spec. > I tried my hand at this last night and think I did an ok first pass. The main issue in my mind is whether to invent a separate node type for row comparisons. This is probably a good idea for a number of reasons, the most obvious being that there's no way to avoid multiple evaluations of the subexpressions if you try to expand it into simple comparisons. Also it seems likely that the planner would find it easier to recognize the relationship to a multicolumn index than if the thing is expanded. (But teaching the planner and the index mechanisms themselves about this is going to be a major project in any case.) One thing I did not like about your first pass is that it makes unsupportable assumptions about there being a semantic relationship between operators named, say, '<' and '<='. Postgres used to have such bogosity in a number of places but we've managed to get rid of most of it. (Offhand I think the only remaining hard-wired assumption about operators of particular names having particular semantics is that the foreign key mechanisms assume '=' must be the right operator to compare keys with. Eventually we need to get rid of that too.) IMHO the right way to do this is to look up a suitable btree operator class and use the appropriate member operators of that class. (In a separate-node-type implementation, we'd probably ignore the operators as such altogether, and just call the btree comparison function of the opclass.) It's not entirely clear to me how to select the opclass when the initially given inputs are of different types, though. In the present code we leave it to oper() to do the right thing, including possibly coercing the inputs to matching types. Possibly we should still apply oper(), but then insist that the selected operator appear as a btree opclass member, comparable to the way we handle sort operators now. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] best way to fetch next/prev record based on index
Tom Lane <[EMAIL PROTECTED]> writes: > One thing I did not like about your first pass is that it makes > unsupportable assumptions about there being a semantic relationship > between operators named, say, '<' and '<='. Hm, I think I even had caught that issue on the mailing list previously. In that case though, it seems even the existing code is insufficient. Instead of testing whether the operator with strcmp against "=" and "<>" it should perhaps be looking for an operator class and the strategy number for the operator and its negator. -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] best way to fetch next/prev record based on index
Greg Stark <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> writes: >> One thing I did not like about your first pass is that it makes >> unsupportable assumptions about there being a semantic relationship >> between operators named, say, '<' and '<='. > In that case though, it seems even the existing code is insufficient. Well, yeah, we know the existing code is broken ;-) > Instead of testing whether the operator with strcmp against "=" and > "<>" it should perhaps be looking for an operator class and the > strategy number for the operator and its negator. Probably. You can find some relevant code in indxpath.c in the stuff that tries to determine whether partial indexes are relevant. I think that the ideal behavior is that we not look directly at the operator name at all. For example it's not too unreasonable to want to write (a,b) ~<~ (c,d) if you have an index that uses those non-locale-aware operators. We should find the operator that matches the name and input arguments, and then try to make sense of the operator semantics by matching it to btree opclasses. Note that it's possible to find multiple matches, for example if someone has installed a "reverse sort" opclass. I think we would want to prefer a match in the datatype's default opclass, if there is one, but otherwise we can probably follow the lead of the existing code and assume that any match is equally good. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly