"Tom Lane" <[EMAIL PROTECTED]> writes: > Gregory Stark <[EMAIL PROTECTED]> writes: >> Having fixed that everything works fine with SET and WITH being reserved >> keywords. You didn't mean to say I should be able to leave WITH unreserved >> did >> you? > > I think we'd decided that was a lost cause, unless you see a way?
No, I decided it was a lost cause right at the start of this. I was just double checking that you didn't think otherwise when you said you didn't see anything else that needed to be reserved. >> Of course that was the easy part... > > Yup ;-) I think I've finally wrapped my head around the idea of mutually recursive and non-linearly recursive queries. Good thing too since in thinking about them I realized that the approach I had intended to use for plain non-recursive queries wasn't really adequate and needs the same machinery. The fundamental feature Postgres needs to implement this that it doesn't have is the ability to be running the same subplan from two different contexts in the plan simultaneously. That is, the ability to start a scan from the beginning without disturbing a scan already in progress for that same node. In retrospect this isn't terribly surprising since it's the same characteristic functions need in a programming language to be safely callable recursively. They need to be reentrant. The normal way to make functions reentrant is to remove all local state and make the caller responsible for providing space to allocate memory for temporary state. But teaching every node type about passing its state up to the parent including bundling up all the state of its children seems like too much work. Both for me and the computer :) In any case that would result in an n^2 algorithm for recursive queries as it would force the executor to re-execute each node for the same record over and over. Much as the obvious recursive fibonacci algorithm is n^2 for example. It also would mean making non-recursive WITH clauses pointless since they would still be executed once per call-site. Instead I suggest we create one type of reentrant node, which memoizes its output. It would be a kind of on-demand Materialize. It would be paired with a second node type which would be the only type of node which can invoke it. This RecursiveCall node would invoke the Memoize node using a special interface in which it passes enough state for the Memoize node to seek to the correct place in its output. Because it memoizes its output it should never actually need to be able to handle recursive calls. When the call depth is maximum each call site would be able to make at most 1 call to the Memoize node. A second one would necessarily have been stopped higher up the call chain by the memoization table. (I've convinced myself that that's true but I should probably work out a good proof of it before I make all this depend on it.) There are three general cases of the Memoize node. Breadth-first, Depth-first, and non-linearly-recursive. In the case of breadth-first search we would want to keep two tuplestore around, one for the current generation, and one for its parent. Once we finish a generation we can free its parent since we know a linearly-recursive scan won't be needing it any more. In the case of depth-first we really want to memoize into a stack. I'm not sure how to fit that into our current model of tuplestores. At worst though, we just allocate a tuplestore for each generation as it appears. We can still throw away old generations once their scans are finished, it just won't be until quite late. Non-linearly-recursive searches are the same case as non-recursive queries. Memoize would just allocate one tuplestore and store tuples blindly without any idea of "generation". It could arrange to throw away old tuples whenever all call-sites' scans have past them. That's the same optimization Simon wants to make in the merge-join case when the mark is moved forward. There are a few open issues to consider. Namely, how to cost a RecursiveCall node. Also, if a subplan has exactly one call-site we really ought to inline it as it will get much more reliable plans. Similarly if there are two call sites we should consider inlining it if the cost of materializing the results (and reading them back) is more than n-call-sites x the cost of executing the query. I would expect That would happen with plain sequential scans for example. Note that we need the Memoize/RecursiveCall nodes even for non-recursive queries. Except in the simplest cases where each common table expression only has a single call site (in which case we could have just inlined them anyways) we need some way to retain state and avoid reexecuting the plan multiple times. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings