On 2011.05.16 02:26 AM, Karim wrote: > Use regular expression for bad HTLM or beautifulSoup (google it), below > a exemple to extract all html links: > > linksList = re.findall('<a href=(.*?)>.*?</a>',htmlSource) > for link in linksList: > print link I was afraid I might have to use regexes (mostly because I could never understand them). Even the BeautifulSoup website itself admits it's awful with Python 3 - only the admittedly broken 3.1.0 will work with Python 3 at all. ElementTree doesn't seem to have been updated in a long time, so I'll assume it won't work with Python 3. lxml looks promising, but it doesn't say anywhere whether it'll work on Python 3 or not, which is puzzling since the latest release was only a couple months ago.
Actually, if I'm going to use regex, I might as well try to implement Versions* in Python. Thanks for the answers! *http://en.totalcmd.pl/download/wfx/net/Versions (original, made for Total Commander) and https://addons.mozilla.org/en-US/firefox/addon/versions-wfx_versions/ (clone implemented as a Firefox add-on; it's so wonderful, I even wrote the docs for it!) -- http://mail.python.org/mailman/listinfo/python-list