On Wed, 03 May 2006 10:29:55 -0700, ProvoWallis wrote: > I only have one issue that I can't figure out. When I print the new > string I'm getting all of the values in the lt list rather than just > the one that corresponds to the original entry.
I did not realize that each entry would have its own LT value. I had thought that there were several sets of <SC> and <XC> with one <LT>. You only showed one example... I have modified the program to collect LT values at the same time it collects SC and XC values. Also, it now collects whatever code appears before the first SC code. I don't know what this code is for so I just called the variable "before". Notes on the code: * Instead of doing this: title = m.group(2) title = title.strip() I just do this: title = m.group(2).strip() You can apply string methods on any string, and it's convenient to do it all in one line. There are several lines like that. * There are two patterns to detect the LT code. The first one is for finding it, and the second one is only for removing it. The second one uses '^' to anchor the pattern, so it will only remove the LT code if the LT code is the first thing in the string. The first pattern does not have the '^' anchor so it will look ahead, past any number of <SC> codes, to find the next <LT> code. * Otherwise this is pretty much like the first version. It collects data, saves it in a list, and then prints its output from the list. I am busy now, so I won't have any time to make any more versions of this for you. I hope you can study what I have done and understand how to apply the ideas to your problems. Good luck! -- cut here -- cut here -- cut here -- cut here -- cut here -- import re s = "<1><SC>APPEAL<XC>40-24; 40-46; 42-46; 42-48; 42-62; 42-63 " + \ "<1><SC>PROC GUIDE<XC>92<LT>1(b)(1)" + \ "<1><SC>FAM LAW ENF<XC>259-232<LT>-687" + \ "<1><SC>APPEAL<XC>40-38; 40-44; 44-18; 45-15<LT>1" s_space = " " # a single space s_empty = "" # empty string pat_sc = re.compile("\s*(<[^<]+)<SC>([^<]+)<XC>([^<]+)") pat_lt = re.compile("<LT>([^<]+)") pat_lt_remove = re.compile("^<LT>([^<]+)") lst = [] lt = None while True: m = pat_sc.search(s) if not m: break before = m.group(1).strip() title = m.group(2).strip() xc = m.group(3).replace(s_space, s_empty) s = pat_sc.sub(s_empty, s, 1) m = pat_lt.search(s) if m: lt = m.group(1) lt = lt.strip() s = pat_lt_remove.sub(s_empty, s, 1) tup = (before, title, xc, lt) lst.append(tup) for before, title, xc, lt in lst: lst_pp = xc.split(";") for pp in lst_pp: print "%s<SC>%s<XC>%s<LT>%s" % (before, title, pp, lt) -- cut here -- cut here -- cut here -- cut here -- cut here -- -- Steve R. Hastings "Vita est" [EMAIL PROTECTED] http://www.blarg.net/~steveha -- http://mail.python.org/mailman/listinfo/python-list