On Wed, Dec 2, 2009 at 7:24 PM, Mark G <markgraha...@gmail.com> wrote:
> Hi all, > > I am new to python and don't yet know the libraries well. What would > be the best way to approach this problem: I have a html file parsing > script - the file sits on my harddrive. I want to extract the date > modified from the meta-data. Should I read through lines of the file > doing a string.find to look for the character patterns of the meta- > tag, or should I use a DOM type library to retrieve the html element I > want? Which is best practice? which occupies least code? > > You can probably do some string.find's and it might work almost always, HTML is funky and quite often coded badly by bad people. And I would never personally suggest anyone go anywhere near a DOM library, their life will never be happy again :) I'd get lxml -- even though you're not directly using xml. It has a html package in it too, its fast and astoundingly easy to use and fantastically featureful for future growth :) http://codespeak.net/lxml/lxmlhtml.html --S
-- http://mail.python.org/mailman/listinfo/python-list