ProvoWallis wrote: > Hi, > > I'm looking for a little advice about regular expressions. I want to > capture a string of text that falls between an opening squre bracket > and a closing square bracket (e.g., "[" and "]") but I've run into a > small problem. > > I've been using this: '''\[(.*?)\]''' as my pattern. I was expecting > this to be greedy but the funny thing is that it's not greedy enough in > some situations. > > Here's my problem: The end of my string sometimes contains a cross > reference to a section in a book and the subsections are cited using > square brackets exactly like the one I'm using as the ending point in > my original regular expression. > > E.g., the text string in my data looks like this: <core:emph > typestyle="it">see</core:emph> discussion in > § 512.16[3][b]] > > But my regular expression is stopping after the first "]" so after I > add the new markup the output looks like this: > > <core:emph typestyle="it">see</core:emph> discussion in > § 512.16[3]</fn:note>[b]] > > So the last subsection is outside of the note tag. I want something > like this: > > <core:emph typestyle="it">see</core:emph> discussion in > § 512.16[3][b]]</fn:note> > > I'm not sure how to make my capture more greedy so I've resorted to > cleaning up the data after I make the first round of replacements: > > data = re.sub(r'''\[(\d*?)\]</fn:note>\[(\w)\]\]''', > '''[\1][\2]]</fn:note>''', data) > > There's got to be a better way but I'm not sure what it is.
I do: Pyparsing. from pyparsing import * crossref = Suppress("[") + Word(alphanums, exact=1) + Suppress("]") footnote = ( Suppress("[") + SkipTo(crossref) + ZeroOrMore(crossref) + Suppress("]") ) footnote.parseString("[§ 512.16[3][b]]").asList() py> footnote.parseString("[§ 512.16[3][b]]").asList() ['§ 512.16', '3', 'b'] James -- James Stroud UCLA-DOE Institute for Genomics and Proteomics Box 951570 Los Angeles, CA 90095 http://www.jamesstroud.com/ -- http://mail.python.org/mailman/listinfo/python-list