The last bit of performance-related hacking that seems to be needed in the window functions patch is to fix things so that we can trim old rows from the underlying tuplestore when they're no longer needed. In particular I think it's critical to be able to do this for the simple cases of lead() and lag() with constant offsets. This topic has been mentioned repeatedly in the discussions to date, but nothing much got done.
The components we seem to be missing are: 1. lead/lag have to be able to tell whether their offset arguments are constants. 2. The WindowObject API has to be extended so that a window function can tell the executor that it will never again fetch a row before row X. (This has to be cheap enough that we can do it on each call.) 3. The tuplestore API itself seems a bit shy of a load --- what we do not have is a concept that a particular read pointer can move back and forth, but is guaranteed never to move back before some-defined-place. (This is clearly an oversight in my recent redesign of that API.) I don't think it's hard to fix #1. What I propose we do is extend the existing fmgr.h APIs that allow inquiry about the data types of function arguments -- say, something along the line of extern bool get_fn_arg_is_const(FmgrInfo *flinfo, int argnum) This would at bottom be an "IsA(arg_expr, Const)" type of test but it seems best to keep the details hidden from caller functions. (Note: I've already fixed the WindowAgg infrastructure to support get_fn_expr_argtype and friends --- this really isn't optional if you expect polymorphic window functions to be common, and the evidence of the built-in ones is that they are. It's pure luck that none of the built-in ones has had any need to explicitly find out what data type it's processing.) #2 is also easy given a solution to #3; the WinForgetBefore() call would just translate to some appropriate action on the tuplestore read pointer associated with the window function's WindowObject. So I think the problem really reduces to fixing the tuplestore API some more. One way to go at it would be to re-introduce an explicit notion of a "mark" pointer associated with (some or all) read pointers, and have a capability bit saying that a read pointer is allowed to go backward so long as it doesn't go before its mark pointer. However I really like the simplicity of the current API design where all read pointers are alike and there's no explicit notion of a mark pointer. We could keep that simplicity by having a capability bit that says "this pointer can move backward so long as it doesn't become the oldest one". In this design you still need a mark pointer to delimit how much tuplestore you want to keep, but there's no explicit link between any one mark pointer and any one active read pointer so far as tuplestore.c is aware. (Note this'd imply that a WindowObject contains both a mark pointer and a read pointer for its window function, and the WinForgetBefore() call actually moves the mark pointer.) A possible objection is that it's a bit more error-prone to not explicitly say which other read pointer is intended to be the movement fence for your read pointer; but I'm not convinced whether that's important or not. Also, we could really do this without any tuplestore API changes at all, if we were willing to implement any backing-up of the read pointer as copying the read pointer from the mark pointer and then advancing the read pointer. However that could get pretty inefficient if there were any significant distance between the mark and read pointers. Another thing that I'm thinking is that it was a mistake to not explicitly expose tuplestore_trim() as part of the tuplestore API. It seems that the callers always have a much better idea than tuplestore.c of when it would be worth trying to trim the tuplestore. The current WF patch wants to put trim calls into such bizarre places as select_read_pointer, which is utterly counterintuitive and usually a waste of cycles too. If we changed the rule to be that we trim when the caller tells us to, we could do something saner --- I think that really WindowAgg only needs to trim once per input tuple. We'd have to add a line or two in some other places like mergejoin, but it'd really not be much of a problem. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers