On 29/08/11 20:21, William Gill wrote: > I haven't done much with Python for a couple years, bouncing around > between other languages and scripts as needs suggest, so I have some > minor difficulty keeping Python functionality Python functionality in my > head, but I can overcome that as the cobwebs clear. Though I do seem to > keep tripping over the same Py2 -> Py3 syntax changes (old habits die > hard). > > I have a text file with XML like records that I need to parse. By XML > like I mean records have proper opening and closing tags. but fields > don't have closing tags (they rely on line ends). Not all fields appear > in all records, but they do adhere to a defined sequence. > > My initial passes into Python have been very unfocused (a scatter gun of > too many possible directions, yielding very messy results), so I'm > asking for some suggestions, or algorithms (possibly even examples)that > may help me focus. > > I'm not asking anyone to write my code, just to nudge me toward a more > disciplined approach to a common task, and I promise to put in the > effort to understand the underlying fundamentals.
A name that is often thrown around on this list for this kind of question is pyparsing. Now, I don't know anything about it myself, but it may be worth looking into. Otherwise, if you say it's similar to XML, you might want to take a cue from XML processing when it comes to dealing with the file. You could emulate the stream-based approach taken by SAX or eXpat - have methods that handle the different events that can occur - for XML this is "start tag", "end tag", "text node", "processing instruction", etc., in your case, it might be "start/end record", "field data", etc. That way, you could separate the code that keeps track of the current record, and how the data fits together to make an object structure, and the parsing code, that knows how to convert a line of data into something meaningful. Thomas -- http://mail.python.org/mailman/listinfo/python-list