Re: [PERFORM] best way to fetch next/prev record based on index

2004-07-28 Thread Greg Stark

Greg Stark <[EMAIL PROTECTED]> writes:

> Fixing it to write out complex boolean expressions wouldn't be too hard, but
> I'm not clear it would be worth it, since I suspect the end result would be as
> the comment indicates, to introduce a new runtime node.

Just to prove that it isn't really all that hard, I took a stab at doing this.
This is basically only my second attempt to write even trivial bits of server
backend code so I certainly don't suggest anyone try using this code.

In fact it doesn't quite compile, because I have a bit of confusion between
the char *oprname and List *opname variables. Now I could clear that up, but
I'm missing one piece of the puzzle. To make it work I do need a way to
construct a List *opname from ">" or "=" and I don't know how to do that.

I think that's all I'm missing, but perhaps in the morning I'll look at this
code and wonder "what was I thinking?!"


This approach won't get the optimizer to actually use an index for these
comparisons, but it will fix the semantics to match the spec. Later we can
either improve the optimizer to detect expressions like this (which I think
would be cooler since some users may write them by hand and not use the
row-expression approach, but I don't see how to do it), or introduce a new
run-time node and have the optimizer handle it. But at least we won't have to
worry about backwards-compatibility issues with the semantics changing.

Oh, I tried to stick to the style, but sometimes I couldn't help myself. I
suppose I would have to fix up the style the rest of the way if I got it
working and you wanted a patch to apply.


/*
 * Transform a "row op row" construct
 */
static Node *
make_row_op(ParseState *pstate, List *opname, Node *ltree, Node *rtree)
{
Node   *result = NULL;
RowExpr*lrow,
   *rrow;
List   *largs,
   *rargs;
char   *oprname;

/* Inputs are untransformed RowExprs */
lrow = (RowExpr *) transformExpr(pstate, ltree);
rrow = (RowExpr *) transformExpr(pstate, rtree);
Assert(IsA(lrow, RowExpr));
Assert(IsA(rrow, RowExpr));
largs = lrow->args;
rargs = rrow->args;

if (list_length(largs) != list_length(rargs))
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
 errmsg("unequal number of entries in row expression")));

oprname = strVal(llast(opname));

if (strcmp(oprname, "=") == 0) 
{
result = make_row_op_simple(pstate, "=", largs, rargs);
}

else if (strcmp(oprname, "<>") == 0) 
{
result = make_row_op_simple(pstate, "<>", largs, rargs);
}

else if ((strcmp(oprname, "<") == 0) ||
 (strcmp(oprname, ">") == 0)) 
{
result = make_row_op_complex(pstate, oprname, largs, rargs);
}

/* alternatively these last two could just create negated < and >
 * expressions. Which is better depends on whether the extra clause
 * confuses the optimizer more or less than having to push the NOTs down 
 */

else if (strcmp(oprname, ">=") == 0)
{
Node *branch = make_row_op_simple(pstate, "=", largs, rargs);
result = make_row_op_complex(pstate, ">", largs, rargs);
result = (Node *) makeBoolExpr(OR_EXPR, list_make2(result, branch));
}

else if (strcmp(oprname, "<=") == 0)
{
Node *branch = make_row_op_simple(pstate, "=", largs, rargs);
result = make_row_op_complex(pstate, "<", largs, rargs);
result = (Node *) makeBoolExpr(OR_EXPR, list_make2(result, branch));
}


else
{
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("operator %s is not supported for row expressions",
oprname)));
}

return result;
}

/*
 * Handle something like
 * (A,B,C) = (X,Y,Z)
 * By constructing
 * (A=X) AND (B=Y) AND (C=Z)
 *
 */

static Node *
make_row_op_simple(ParseState *pstate, char *oprname, 
   List *largs, List *rargs)
{
ListCell *l, *r;
BoolExprType boolop;
Node *result;

boolop = strcmp(oprname, "<>")==0 ? OR_EXPR : AND_EXPR;

forboth(l, largs, r, rargs)
{
Node*larg = (Node *) lfirst(l);
Node*rarg = (Node *) lfirst(r);
Node*cmp;

cmp = (Node *) make_op(pstate, opname, larg, rarg);
cmp = coerce_to_boolean(pstate, cmp, "row comparison");
if (result == NULL)
result = cmp;
else
result = (Node *) makeBoolExpr(boolop,
   list_make2(result, cmp));
}

if (result == NULL)
{
/* zero-length rows?  Generate constant TRUE or FALSE */
if (boolop == AND_EXPR)
result = makeBoolConst(true, false);
else
result = makeBoolConst(false, false);
}

return result;
}


/*
 * Handles something like:
 * (A,B,C) > (X,Y,Z)
 *
 * By constructing something like

Re: [PERFORM] best way to fetch next/prev record based on index

2004-07-28 Thread Merlin Moncure
> Greg Stark <[EMAIL PROTECTED]> writes:
> This approach won't get the optimizer to actually use an index for
these
> comparisons, but it will fix the semantics to match the spec. Later we
can
> either improve the optimizer to detect expressions like this (which I
> think
> would be cooler since some users may write them by hand and not use
the
> row-expression approach, but I don't see how to do it), or introduce a
new
> run-time node and have the optimizer handle it. But at least we won't
have
> to
> worry about backwards-compatibility issues with the semantics
changing.
> 
> Oh, I tried to stick to the style, but sometimes I couldn't help
myself. I
> suppose I would have to fix up the style the rest of the way if I got
it
> working and you wanted a patch to apply.

Regarding the <= and >= operators:  can you apply them in the complex
pass? If you can, this might be more efficient.

> /*
>  * Handles something like:
>  * (A,B,C) > (X,Y,Z)
>  *
>  * By constructing something like:
>  * ( ( A > X) OR (A=X AND B>Y) OR (A=X AND B=Y AND C>Z) )
>  *  ^
>  */ |

the last comparison of the last major clause (or the only comparison for
a single field row construct) is a special case.  In > cases use >, in
>= cases use >=, etc.; this is logical equivalent to doing or of simple
= intersected with complex >.  

Is this step of the transformation visible to the optimizer/planner?
For purely selfish reasons, it would be really nice if a field by field
row construction could get a fast path to the index if the fields match
the index fields.

Merlin

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] best way to fetch next/prev record based on index

2004-07-28 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes:
> Removing <,<=,>,>= would be trivial.

... and not backwards-compatible.  If we did that then cases involving
unlabeled row expressions would no longer work as they did in prior
releases.  For example

select (1,2,3) < (4,5,6);

is accepted by all releases back to 7.0, and probably much further (but
7.0 is the oldest I have handy to test).  The only reason the code in
parse_expr.c appears new is that the functionality used to be in gram.y.

I'd like to see this fixed to comply with the spec, but given the lack
of complaints about the existing behavior over so many years, ripping
it out meanwhile doesn't seem appropriate.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] best way to fetch next/prev record based on index

2004-07-28 Thread Merlin Moncure
> Greg Stark <[EMAIL PROTECTED]> writes:
> > Removing <,<=,>,>= would be trivial.
> 
> ... and not backwards-compatible.  If we did that then cases involving
> unlabeled row expressions would no longer work as they did in prior
> releases.  For example
> 
>   select (1,2,3) < (4,5,6);
> 
> is accepted by all releases back to 7.0, and probably much further
(but
> 7.0 is the oldest I have handy to test).  The only reason the code in
> parse_expr.c appears new is that the functionality used to be in
gram.y.
> 
> I'd like to see this fixed to comply with the spec, but given the lack
> of complaints about the existing behavior over so many years, ripping
> it out meanwhile doesn't seem appropriate.

Just to clarify:
I think Greg is arguing to bring pg to SQL 92 spec and not remove
anything.  ISTM the SQL standard was designed with exactly my problem in
mind: how to get the next key in a table. 

IMHO, relying 
on select (1,2,3) < (4,5,6); 
to give a result which is neither standard nor documented seems to be
bad style.  The current methodology could cause pg to give incorrect
results in TPC benchmarks...not good.  Also, it's trivial to rewrite
that comparison with the old behavior using 'and'.  OTOH, it is not
trivial to rewrite the comparison to do it the correct way...it's kind
of an SQL 'trick question'.  Most likely, a very small minority of pg
users are even away of the above syntax anyways.

To be fair, I'm a big fan of deprecating features for at least one
release for compatibility reasons.  It's no big deal to me, because I'm
already writing the queries out the long way anyways.  My interests are
in the optimizer.  If there is a way to enhance it so that it
multi-column comparisons in a logical way, that would be great.  Is this
theoretically possible (probable)?

Merlin


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] best way to fetch next/prev record based on index

2004-07-28 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> The only reason the code in parse_expr.c appears new is that the
> functionality used to be in gram.y.

Ah, that was what I was missing. Though it's odd since it seems there was code
in parse_expr.c to handle the "=" case specially.

> I'd like to see this fixed to comply with the spec, but given the lack
> of complaints about the existing behavior over so many years, ripping
> it out meanwhile doesn't seem appropriate.

I tried my hand at this last night and think I did an ok first pass. But I'm
missing one piece of the puzzle to get it to compile.

What do I need to know to be able to construct a List* suitable for passing as
the second arg to make_op() knowing only that I want to create a List* to
represent "=" or "<" or so on?

I also had another question I didn't ask in the other email. In the midst of a
forboth() loop, how would I tell if I'm at the last element of the lists?
Would lnext(l)==NULL do it?

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] best way to fetch next/prev record based on index

2004-07-28 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> writes:
>> The only reason the code in parse_expr.c appears new is that the
>> functionality used to be in gram.y.

> Ah, that was what I was missing. Though it's odd since it seems there was code
> in parse_expr.c to handle the "=" case specially.

IIRC, the case involving a subselect, eg
... WHERE (1,2) = ANY (SELECT a, b FROM foo) ...
has always been handled in parse_expr.c, but cases involving simple
rows were previously expanded in gram.y.  One of the reasons I moved
the logic over to parse_expr.c was the thought that it would be easier
to do it right in parse_expr.c --- gram.y would not be allowed to look
up related operators, which seems necessary to handle the construct
per spec.

> I tried my hand at this last night and think I did an ok first pass.

The main issue in my mind is whether to invent a separate node type for
row comparisons.  This is probably a good idea for a number of reasons,
the most obvious being that there's no way to avoid multiple evaluations
of the subexpressions if you try to expand it into simple comparisons.
Also it seems likely that the planner would find it easier to recognize
the relationship to a multicolumn index than if the thing is expanded.
(But teaching the planner and the index mechanisms themselves about this
is going to be a major project in any case.)

One thing I did not like about your first pass is that it makes
unsupportable assumptions about there being a semantic relationship
between operators named, say, '<' and '<='.  Postgres used to have such
bogosity in a number of places but we've managed to get rid of most of
it.  (Offhand I think the only remaining hard-wired assumption about
operators of particular names having particular semantics is that the
foreign key mechanisms assume '=' must be the right operator to compare
keys with.  Eventually we need to get rid of that too.)

IMHO the right way to do this is to look up a suitable btree operator
class and use the appropriate member operators of that class.  (In a
separate-node-type implementation, we'd probably ignore the operators
as such altogether, and just call the btree comparison function of the
opclass.)  It's not entirely clear to me how to select the opclass when
the initially given inputs are of different types, though.  In the
present code we leave it to oper() to do the right thing, including
possibly coercing the inputs to matching types.  Possibly we should
still apply oper(), but then insist that the selected operator appear
as a btree opclass member, comparable to the way we handle sort
operators now.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] best way to fetch next/prev record based on index

2004-07-28 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> One thing I did not like about your first pass is that it makes
> unsupportable assumptions about there being a semantic relationship
> between operators named, say, '<' and '<='.  

Hm, I think I even had caught that issue on the mailing list previously.

In that case though, it seems even the existing code is insufficient. Instead
of testing whether the operator with strcmp against "=" and "<>" it should
perhaps be looking for an operator class and the strategy number for the
operator and its negator.

-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] best way to fetch next/prev record based on index

2004-07-28 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> writes:
>> One thing I did not like about your first pass is that it makes
>> unsupportable assumptions about there being a semantic relationship
>> between operators named, say, '<' and '<='.  

> In that case though, it seems even the existing code is insufficient.

Well, yeah, we know the existing code is broken ;-)

> Instead of testing whether the operator with strcmp against "=" and
> "<>" it should perhaps be looking for an operator class and the
> strategy number for the operator and its negator.

Probably.  You can find some relevant code in indxpath.c in the stuff
that tries to determine whether partial indexes are relevant.

I think that the ideal behavior is that we not look directly at the
operator name at all.  For example it's not too unreasonable to want
to write (a,b) ~<~ (c,d) if you have an index that uses those
non-locale-aware operators.  We should find the operator that matches
the name and input arguments, and then try to make sense of the operator
semantics by matching it to btree opclasses.

Note that it's possible to find multiple matches, for example if someone
has installed a "reverse sort" opclass.  I think we would want to prefer
a match in the datatype's default opclass, if there is one, but
otherwise we can probably follow the lead of the existing code and
assume that any match is equally good.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly