After a bit of reading, I've decided to use Beautiful Soup 4, with lxml as the parser. I considered simply using lxml to do all the work, but I just got lost in the documentation and tutorials. I couldn't find a clear explanation of how to parse an HTML file and then navigate its structure.
The Beautiful Soup 4 documentation was very clear, and BS4 itself is so simple and Pythonic. And best of all, since version 4 no longer does the parsing itself, you can choose your own parser, and it works with lxml, so I'll still be using lxml, but with a nice, clean overlay for navigating the tree structure. Thanks for the advice! -- http://mail.python.org/mailman/listinfo/python-list