alb wrote: > Hi Peter, I'll try to comment the code below to verify if I understood > it correctly or missing some major parts. Comments are just below code > with the intent to let you read the code first and my understanding > afterwards.
Let's start with the simplest: > Peter Otten <__pete...@web.de> wrote: >> def show2(self): >> yield str(self) >> for child in self.children: >> yield from child.show2() > > ok, this as well requires some explanation. Kinda lost again. From what > I can naively deduce is that it is a generator that returns the str > defined in the node as __str__ and it shows it for the whole tree. Given a tree A --> A1 A2 --> A21 A22 A3 assume a slightly modified show2(): def append_nodes(node, nodes): nodes.append(node) for child in node.children: append_nodes(child, nodes) When you invoke this with the root node in the above sample tree and an empty list nodes = [] append_nodes(A, nodes) the first thing it will do is append the root node to the nodes list [A] Then it iterates over A's children: append_nodes(A1, nodes) will append A1 and return immediately because A1 itself has not children. [A, A1] append_nodes(A2, nodes) will append A2 and then iterate over A2's children. As A21 and A22 don't have any children append_nodes(A21, nodes) and append_nodes(A22, nodes) will just append the respective node with no further nested ("recursive") invocation, and thus the list is now [A, A1, A21, A22] Finally the append_nodes(A3, nodes) will append A3 and then return because it has no children, and we end up with nodes = [A, A1, A21, A22, A3] Now why the generator? For such a small problem it doesn't matter, for large datasets it is convenient that you can process the first item immmediately, when the following ones may not yet be available. It also becomes easier to implement different treatment of the items or to stop in the process: for deer in hunt(): kill(deer) if have_enough_food(): break for animal in hunt(): take_photograph(animal) if not_enough_light(): break Also, you never need more than one item in memory instead of the whole list for many problems. Ok, how to get from the recursive list building to yielding nodes as they are encountered? The basic process is always the same: def f(items) items.append(3) items.append(6) for i in range(10): items.append(i) items = [] f(items) for item in items: print(item) becomes def g(): yield 3 yield 6 for i in range(10): yield i for item in g(): print(items) In Python 3.3 there was added some syntactic sugar so that you can write def g(): yield 3 yield 6 yield from range(10) Thus def append_nodes(node, nodes): nodes.append(node) for child in node.children: append_nodes(child, nodes) becomes def generate_nodes(node): yield node for child in node.children: yield from generate_nodes(child) This looks a lot like show2() except that it's not a method and thus the node not called self and that the node itself is yielded rather than str(node). The latter makes the function a bit more flexible and is what I should have done in the first place. The show() method is basically the the same, but there are varying prefixes before the node name. Here's a simpler variant that just adds some indentation. We start with generate_nodes() without the syntactic sugar. This is because we need a name for the nodes yielded from the nested generator call so that we can modify them: def indented_nodes(node): yield node for child in node.children: for desc in from indented_nodes(child): yield desc Now let's modify the yielded nodes: def indented_nodes(node): yield [node] for child in node.children: for desc in indented_nodes(child): yield ["***"] + desc How does it fare on the example tree? A --> A1 A2 --> A21 A22 A3 The lists will have an "***" entry for every nesting level, so we get [A] ["***", A1] ["***", A2] ["***", "***", A21] ["***", "***", A22] ["***", A3] With "".join() we can print it nicely: for item in indented_nodes(tree): print("".join(item)) But wait, "".join() only accepts strings so let's change yield [node] to yield [node.name] # str(node) would also work A ***A1 ***A2 ******A21 ******A22 ***A3 >> def show2(root): >> for line in root.show2(): >> print(line) > Here we implement the functions to print a node, but I'm not sure I > understand why do I have to iterate if the main() iterates again over the > nodes. Your example had the structure A A1 A11 A12 A2 and I was unsure if there could be data files that have multiple root nodes, e. g. A A1 A11 A12 A2 B B1 B2 To simplify the handling of these I introduced an artificial root R R A A1 A11 A12 A2 B B1 B2 which makes all toplevel nodes in the data file children of R. In the main() function I iterate over R's children to hide R from the user. You can replace for node in tree.children: show_tree(node) print("") in my original code with show_tree(tree) to see the hidden node. I may address the rest of your post later unless someone else does. In the mean time, can you please provide the data file that triggers the IndexError to help me with the debugging? -- https://mail.python.org/mailman/listinfo/python-list