gervaz wrote:
Hi all, I need to find all the address in a html source page, I'm
using:
'href="(?P<url>http://mysite.com/[^"]+)">(<b>)?(?P<name>[^</a>]+)(</
b>)?</a>'
but the [^</a>]+ pattern retrieve all the strings not containing <
or / or a etc, although I just not want the word "</a>". How can I
specify: 'do not search the string "blabla"?'

If the name is followed by "<" then just match the name with [^<]+:

href="(?P<url>http://mysite\.com/[^"]+)">(<b>)?(?P<name>[^<]+)(</
> b>)?</a>

I've also changed mysite.com to mysite\.com because . will match any character, but what you probably want to match is ".".
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to