gervaz wrote: > On Jan 19, 4:01 pm, Ant <ant...@gmail.com> wrote: >> A 0-width positive lookahead is probably what you want here: >> >> >>> s = """ >> >> ... hdhd <a href="http://mysite.com/blah.html">Test <i>String</i> OK</ >> a> >> ... >> ... """>>> p = r'href="(http://mysite.com/[^"]+)">(.*)(?=</a>)' >> >>> m = re.search(p, s) >> >>> m.group(1) >> >> 'http://mysite.com/blah.html'>>> m.group(2) >> >> 'Test <i>String</i> OK' >> >> The (?=...) bit is the lookahead, and won't consume any of the string >> you are searching. I've binned the named groups for clarity. >> >> The beautiful soup answers are a better bet though - they've already >> done the hard work, and after all, you are trying to roll your own >> partial HTML parser here, which will struggle with badly formed html... > > Ok, thank you all, I'll take a look at beautiful soup, albeit the > lookahead solution fits better for the little I have to do.
Little things tend to get out of hand quickly... This is the reason why so many gave you the hint. Diez -- http://mail.python.org/mailman/listinfo/python-list