Hi all,
I have data files with a format that can be scheamatized as:
File Header Contents . . . File Header End Tag Node Header Contents . . . Node Header End Tag Node Contents . . . Node End Tag [Repeat Node elements until end of file]
I'm refactoring the heck out of a file conversion utility I wrote for this format back when I knew even less than I do now =:-0
The main change in refactoring is moving it to OOP. I have a method that serves as the entry point for parsing the files. It separates the file header content and the nodes (or body content), sending them each to appropriate methods to be processed.
I want the body parser to accept a list of lines corresponding to the nodes portions of my file, separate out each node (everything between node end tags, the bottommost end tag included in the node), and send each node's contents to a further method for processing. What I have now works and is a big improvement on what I had before. But, I know that I tend to employ while loops more than I perhaps ought, and much of the style of OOP has yet to sink in. So, any suggestions on how to make this method more Pythonic would be most welcome.
(body_contents is a list of file lines, with all file header lines removed.)
. def body_parser(self, body_contents): . . while body_contents: . . count = 0 . current_node_contents = [] . . for line in body_contents: . current_node_contents.append(line) . count += 1 . if line == node_end_tag: # node_end_tag elsewhere . break # defined and includes '\n' . . self.node_parser(current_node_contents) . body_contents = body_contents[count:]
Another alternative has occurred to me, but seems to compensate for the avoidance of while by being ugly. Untested code:
. def alt_body_parser(self, body_contents): . . body_contents = ''.join(body_contents) . body_contents = body_contents.split(node_end_tag) . . # ugly lives here -- having removed node_end_tag's . # with split, I need to put them back on: . count = 0 . for i in body_contents: . body_contents[count] = i + node_end_tag . count += 1 . # (The sub-alternative of having the node_parser method . # put them back, while easier, also seems a dangerous . # separation of responsibility for the integrity of the data . # format.) . . for i in body_contents: . self.node_parser(i)
So, which of these 2 (and a half) ways seems most Pythonic to the more experienced? Any better ways I've overlooked?
Thanks, and best to all,
Brian vdB
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor