Summary: I am looking for feedback on a potential technique for coding recursive generators.
Often times the most convenient algorithms for creating a sequence of results are recursive, while the most convenient interface for the consumption of those results is an iterator. PEP 380 introduced the "yield from" syntax which provides a technique for generators to delegate to other generators, including recursively. The trouble is that recursion and generators don't necessarily play all that well together from a performance point of view. As Greg Ewing noted in PEP 380: The overhead of passing __next__() calls and yielded values down and up the chain can cause what ought to be an O(n) operation to become, in the worst case, O(n**2). Ewing goes on to describe some possible optimizations possible in Python implementations. However, in the Python versions I have checked, there is still some O(depth) overhead to a chain of "yield from" calls. To make this concrete, let's consider a problem with a canonical recursive implementation -- an inorder tree traversal. (My actual motivation for thinking about this issue comes from implementing generators for various kinds of combinatorial objects. But tree traversal provides a good familiar example.) For discussion purposes, let's represent a tree node as a tuple of the form (key left right). Then, a simple tree might be ```Python tree1 = ('b', ('a', None, None), ('c', None, None)) ``` Or, a badly unbalanced example could be: ```python tree2 = ('a', None, ('b', None, ('c', None, ('d', None, ('e', None, ('f', None, ('g', None, ('h', None, ('i', None, None))))))))) ``` Then, we can code inorder traversal as: ```python def inorder_traverse(node): k, l, r = node if l: yield from inorder_traverse(l) yield k if r: yield from inorder_traverse(r) ``` So far, all very familiar. The trouble is that if you are at a depth of d, every value that is yielded by the algorithm gets passed through d stack frames before it reaches the actual consumer of the data. This issue, specifically the case of tree traversal, was the subject of a 2016 thread in comp.lang.python "Linear Time Tree Traversal Generator". I am not reviving that 5+-year-old thread since my objectives are a little different. In that thread, Ned Batchelder suggested (and gave a nice implementation for) reimplementing the traversal to use an explicit stack and a loop to traverse the tree, so that "each yield directly produces a value to the caller." That, to my mind, basically solved the original poster's problem -- the code for the iterative version is reasonably concise and understandable, and is faster than the recursive version both in the O(n) sense and also for small problems. But, when the candidate recursive algorithm is more complicated than that for tree traversal, translating to an iterative version can be difficult. Maintaining a stack of local variables is not too hard, but converting a recursive function to iterative completely changes the flow, such that familiar control structures become a tangled mass of GOTOs. (Which is even harder in a language without a GOTO.) The result (at least when I have tried) can be hard to write and (worse) hard to read. I was looking at the problem and realized that there might be a middle ground -- avoiding the O(depth) cost of a chain of "yield from" clauses, but still keeping the recursive code looking reasonably close to the original. The idea is to modify the recursive function by replacing the "yield from" forms with "yield". The transformed function needs to run from a driver, which simulates the normal call stack. The driver checks each result yielded from the function -- if it is a regular value, it is returned (yielded) to the caller immediately, but if the result is a generator, it is pushed onto that stack. The Python code for such a driver is easier to understand than any english description I have been able to create, so here it is: ```python def recursive_generator_driver(g): """g is a generator object, which is likely to call other generators""" stack = [g] while True: g = stack[-1] try: val = next(g) if isinstance(val, types.GeneratorType): # Yielded value represented a recursive call, so push stack.append(val) else: # Regular value for iterator, give to caller yield val except StopIteration: stack.pop() if len(stack) == 0: return ``` and the modified tree traverser is just ```python def inorder_traverse_mod(node): """ONLY for use with recursive_generator_driver""" k, l, r = node if l: yield inorder_traverse_mod(l) # NOT yield from yield k if r: yield inorder_traverse_mod(r) # NOT yield from ``` This seems to work correctly (at least in my testing). But does saving the "yield from" cost overcome the cost of the driver function with its explicit stack and isinstance() calls? The tradeoffs depend on the specific Python implementation and the algorithm that is to be modified. For the tree traversal example, with PyPy (Ubuntu, 7.3.1) the driver version becomes faster with a complete binary tree of height 12 (8191 nodes) while for CPython 3.8.10 it is not until a height of 22 that the driver version dominates. A perfectly balanced tree is the a best case for the "yield from" code. With a sufficiently badly balanced tree, the driver version can be many times faster. So what do people think of this hack/technique? Is it A) A terrible idea? (Please be specific.) B) Already well-known (and I just missed it.) C) Potentially useful for the niche case of deeply recursive generators? Some additional notes (and thanks if you are still reading this long posting): 1) The same driver can be used with different recursive functions, so this is easy to try out. 2) If a function can be converted to be purely iterative, (as with Batchelder's tree traversal) it will often be faster than either the driver or yield from versions. Hence the driver version would be used for those functions where conversion to an iterative version would be more difficult. 3) If you don't like the isinstance() calls, the type can be signaled some other way. For example the yield statements can be written to pass back a pair, the first element of which is a flag indicating whether the second element is a result value or a generator. 4) This approach means that recursion depth is not restricted by the size of the Python stack. That might be a benefit, but if not, the driver could be modified to enforce a maximum recursion depth. -- https://mail.python.org/mailman/listinfo/python-list