Re: Multiline regex help

Steven Bethard Thu, 03 Mar 2005 09:00:05 -0800

Yatima wrote:

Hey Folks,

I've got some info in a bunch of files that kind of looks like so:

Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34

and so on...

Anyhow, these "fields" repeat several times in a given file (number of
repetitions varies from file to file). The number on the line following the
"RelevantInfo" lines is really what I'm after. Ideally, I would like to have
something like so:

RelevantInfo1 = 10/10/04 # The variable name isn't actually important
RelevantInfo3 = 23       # it's just there to illustrate what info I'm
                         # trying to snag.

Score[RelevantInfo1][RelevantInfo3] = 22 # The value from RelevantInfo2


A possible solution, using the re module:

py> s = """\
... Gibberish
... 53
... MoreGarbage
... 12
... RelevantInfo1
... 10/10/04
... NothingImportant
... ThisDoesNotMatter
... 44
... RelevantInfo2
... 22
... BlahBlah
... 343
... RelevantInfo3
... 23
... Hubris
... Crap
... 34
... """
py> import re
py> m = re.compile(r"""^RelevantInfo1\n([^\n]*)
...                    .*
...                    ^RelevantInfo2\n([^\n]*)
...                    .*
...                    ^RelevantInfo3\n([^\n]*)""",
...                re.DOTALL | re.MULTILINE | re.VERBOSE)
py> score = {}
py> for info1, info2, info3 in m.findall(s):
...     score.setdefault(info1, {})[info3] = info2
...
py> score
{'10/10/04': {'23': '22'}}

Note that I use DOTALL to allow .* to cross line boundaries, MULTILINE to have ^ apply at the start of each line, and VERBOSE to allow me to write the re in a more readable form.

If I didn't get your dict update quite right, hopefully you can see how to fix it!

HTH,

STeVe
--
http://mail.python.org/mailman/listinfo/python-list

Re: Multiline regex help

Reply via email to