Hey Folks,
I've got some info in a bunch of files that kind of looks like so:
Gibberish 53 MoreGarbage 12 RelevantInfo1 10/10/04 NothingImportant ThisDoesNotMatter 44 RelevantInfo2 22 BlahBlah 343 RelevantInfo3 23 Hubris Crap 34
and so on...
Anyhow, these "fields" repeat several times in a given file (number of repetitions varies from file to file). The number on the line following the "RelevantInfo" lines is really what I'm after. Ideally, I would like to have something like so:
RelevantInfo1 = 10/10/04 # The variable name isn't actually important RelevantInfo3 = 23 # it's just there to illustrate what info I'm # trying to snag.
Score[RelevantInfo1][RelevantInfo3] = 22 # The value from RelevantInfo2
A possible solution, using the re module:
py> s = """\ ... Gibberish ... 53 ... MoreGarbage ... 12 ... RelevantInfo1 ... 10/10/04 ... NothingImportant ... ThisDoesNotMatter ... 44 ... RelevantInfo2 ... 22 ... BlahBlah ... 343 ... RelevantInfo3 ... 23 ... Hubris ... Crap ... 34 ... """ py> import re py> m = re.compile(r"""^RelevantInfo1\n([^\n]*) ... .* ... ^RelevantInfo2\n([^\n]*) ... .* ... ^RelevantInfo3\n([^\n]*)""", ... re.DOTALL | re.MULTILINE | re.VERBOSE) py> score = {} py> for info1, info2, info3 in m.findall(s): ... score.setdefault(info1, {})[info3] = info2 ... py> score {'10/10/04': {'23': '22'}}
Note that I use DOTALL to allow .* to cross line boundaries, MULTILINE to have ^ apply at the start of each line, and VERBOSE to allow me to write the re in a more readable form.
If I didn't get your dict update quite right, hopefully you can see how to fix it!
HTH,
STeVe -- http://mail.python.org/mailman/listinfo/python-list