Hi list, I'm looking for ideas as to a pretty, Pythonic solution for a specific problem that I am solving over and over but where I'm never happy about the solution in the end. It always works, but never is pretty. So see this as an open-ended brainstorming question.
Here's the task: There's a custom file format. Each line can be parsed individually and, given the current context, the meaning of each individual line is always clearly distinguishable. I'll give an easy example to demonstrate: moo = koo bar = foo foo := abc def baz = abc Let's say the root context knows only two regexes and give them names: keyvalue: \w+ = \w+ start-multiblock: \w+ := The keyvalue is contained in itself, when the line is successfully parsed all the information is present. The start-multiblock however gives us only part of the puzzle, namely the name of the following block. In the multiblock context, there's different regexes that can happen (actually only one): multiblock-item: \s\w+ Now obviously whe the block is finished, there's no delimiter. It's implicit by the multiblock-item regex not matching and therefore we backtrack to the previous parser (root parser) and can successfully parse the last line baz = abc. Especially consider that even though this is a simple example, generally you'll have multiple contexts, many more regexes and especially nesting inside these contexts. Without having to use a parser generator (for those the examples I deal with are usually too much overhead) what I usually end up doing is building a state machine by hand. I.e., I memorize the context, match those and upon no match manually delegate the input data to backtracked matchers. This results in AWFULLY ugly code. I'm wondering what your ideas are to solve this neatly in a Pythonic fashion without having to rely on third-party dependencies. Cheers, Joe -- https://mail.python.org/mailman/listinfo/python-list