"beza1e1" <[EMAIL PROTECTED]> writes: > I think for a quick hack, this is as good as a parser. A simple parser > would miss some cases as well. RE are nearly not extendable though, so > your critic is valid.
Pretty much any first attempt is going to miss some cases. There libraries available that are have stood the test of time. Simply usinng one of those is the right solution. > The point is, what George wants to do. A mixture would be possible as > well: > Getting all <a ...> by a RE and then extracting the url with something > like a parser. I thought the point was to extract all URLs? Those appear in attributes of tags other than A tags. While that's a meta-problem that requires properly configuring the parser to deal with, it's something that's *much* simpler to do if you've got a parser that understands the structure of HTML - you should be able to specify tag/attribute pairs to look for - than with something that is treating it as unstructured text. <mike -- Mike Meyer <[EMAIL PROTECTED]> http://www.mired.org/home/mwm/ Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information. -- http://mail.python.org/mailman/listinfo/python-list