On Oct 29, 7:01 pm, Tim Chase <[EMAIL PROTECTED]> wrote: > > I need a regex expression which returns the start to the x=ANIMAL for > > only the x=Dog fragments so all my entries should be start ... > > (something here) ... x=Dog . So I am really interested in fragments 1 > > and 3 only. > > > My idea (primitive) ^start.*?x=Dog doesn't work because clearly it > > would return results > > > start > > x=Dog # (good) > > > and > > > start > > x=Cat > > stop > > start > > x=Dog # bad since I only want start ... x=Dog portion > > Looks like the following does the trick: > > >>> s = """start #frag 1 start > ... x=Dog # frag 1 end > ... stop > ... start # frag 2 start > ... x=Cat # frag 2 end > ... stop > ... start #frag 3 start > ... x=Dog #frag 3 end > ... stop""" > >>> import re > >>> r = re.compile(r'^start.*\nx=Dog.*\nstop.*', re.MULTILINE) > >>> for i, result in enumerate(r.findall(s)): > ... print i, repr(result) > ... > 0 'start #frag 1 start\nx=Dog # frag 1 end\nstop' > 1 'start #frag 3 start\nx=Dog #frag 3 end\nstop' > > -tkc
This will only work if 'x=Dog' directly follows 'start' (which happens in the given example). If that's not necessarily the case, I would do it in two steps (in fact I wouldn't use regexps probably but...): >>> for chunk in re.split(r'\nstop', data): ... m = re.search('^start.*^x=Dog', chunk, re.DOTALL | re.MULTILINE) ... if m: print repr(m.group()) ... 'start #frag 1 start \nx=Dog' 'start #frag 3 start \nx=Dog' -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list