On Thu, 03 Mar 2005 09:54:02 -0700, Steven Bethard <[EMAIL PROTECTED]> wrote: > > A possible solution, using the re module: > > py> s = """\ > ... Gibberish > ... 53 > ... MoreGarbage > ... 12 > ... RelevantInfo1 > ... 10/10/04 > ... NothingImportant > ... ThisDoesNotMatter > ... 44 > ... RelevantInfo2 > ... 22 > ... BlahBlah > ... 343 > ... RelevantInfo3 > ... 23 > ... Hubris > ... Crap > ... 34 > ... """ > py> import re > py> m = re.compile(r"""^RelevantInfo1\n([^\n]*) > ... .* > ... ^RelevantInfo2\n([^\n]*) > ... .* > ... ^RelevantInfo3\n([^\n]*)""", > ... re.DOTALL | re.MULTILINE | re.VERBOSE) > py> score = {} > py> for info1, info2, info3 in m.findall(s): > ... score.setdefault(info1, {})[info3] = info2 > ... > py> score > {'10/10/04': {'23': '22'}} > > Note that I use DOTALL to allow .* to cross line boundaries, MULTILINE > to have ^ apply at the start of each line, and VERBOSE to allow me to > write the re in a more readable form. > > If I didn't get your dict update quite right, hopefully you can see how > to fix it!
Thanks! That was very helpful. Unfortunately, I wasn't completely clear when describing the problem. Is there anyway to extract multiple scores from the same file and from multiple files (I will probably use the "fileinput" module to deal with multiple files). So, if I've got say: Gibberish 53 MoreGarbage 12 RelevantInfo1 10/10/04 NothingImportant ThisDoesNotMatter 44 RelevantInfo2 22 BlahBlah 343 RelevantInfo3 23 Hubris Crap 34 SecondSetofGarbage 2423 YouGetThePicture 342342 RelevantInfo1 10/10/04 HoHum 343 MoreStuffNotNeeded 232 RelevantInfo2 33 RelevantInfo3 44 sdfsdf RelevantInfo1 10/11/04 InsertBoringFillerHere 43234 Stuff MoreStuff RelevantInfo2 45 ExcitingIsntIt 324234 RelevantInfo3 60 Lalala Sorry for the long and painful example input. Notice that the first two "RelevantInfo1" fields have the same info but that the RelevantInfo2 and RelevantInfo3 fields have different info. Also, there will be cases where RelevantInfo3 might be the same with a different RelevantInfo2. What, I'm hoping for is something along then lines of being able to organize it like so (don't worry about the format of the output -- I'll deal with that later; "RelevantInfo" shortened to "Info" for readability): Info1[0], Info[1], Info[2] ... Info3[0] Info2[Info1[0],Info3[0]] Info2[Info1[1],Info3[1]] ... Info3[1] Info2[Info1[0],Info3[1]] ... Info3[2] Info2[Info1[0],Info3[2]] ... ... I don't really care if it's a list, dictionary, array etc. Thanks again for your help. The multiline option in the re module is very useful. Take care. -- Clarke's Conclusion: Never let your sense of morals interfere with doing the right thing. -- http://mail.python.org/mailman/listinfo/python-list