Thanks. I'm thinking the choice might be between lxml and Beautiful Soup, but since BS uses lxml as a parser, I'm trying to figure out the difference between them. I don't necessarily need the simplest (html.parser), but I want to choose one that is simple enough yet powerful enough that I won't have to learn another method later.
On Tue, Mar 6, 2012 at 5:35 PM, Ian Kelly <ian.g.ke...@gmail.com> wrote: > On Tue, Mar 6, 2012 at 4:05 PM, John Salerno <johnj...@gmail.com> wrote: >>> Anything that allows me NOT to use REs is welcome news, so I look forward >>> to learning about something new! :) >> >> I should ask though...are there alternatives already bundled with Python >> that I could use? Now that you mention it, I remember something called >> HTMLParser (or something like that) and I have no idea why I never looked >> into that before I messed with REs. > > HTMLParser is pretty basic, although it may be sufficient for your > needs. It just converts an html document into a stream of start tags, > end tags, and text, with no guarantee that the tags will actually > correspond in any meaningful way. lxml can be used to output an > actual hierarchical structure that may be easier to manipulate and > extract data from. > > Cheers, > Ian -- http://mail.python.org/mailman/listinfo/python-list