Re: What's the best way to write this regular expression?

John Salerno Tue, 06 Mar 2012 15:51:28 -0800

Thanks. I'm thinking the choice might be between lxml and Beautiful
Soup, but since BS uses lxml as a parser, I'm trying to figure out the
difference between them. I don't necessarily need the simplest
(html.parser), but I want to choose one that is simple enough yet
powerful enough that I won't have to learn another method later.





On Tue, Mar 6, 2012 at 5:35 PM, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> On Tue, Mar 6, 2012 at 4:05 PM, John Salerno <johnj...@gmail.com> wrote:
>>> Anything that allows me NOT to use REs is welcome news, so I look forward 
>>> to learning about something new! :)
>>
>> I should ask though...are there alternatives already bundled with Python 
>> that I could use? Now that you mention it, I remember something called 
>> HTMLParser (or something like that) and I have no idea why I never looked 
>> into that before I messed with REs.
>
> HTMLParser is pretty basic, although it may be sufficient for your
> needs.  It just converts an html document into a stream of start tags,
> end tags, and text, with no guarantee that the tags will actually
> correspond in any meaningful way.  lxml can be used to output an
> actual hierarchical structure that may be easier to manipulate and
> extract data from.
>
> Cheers,
> Ian
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: What's the best way to write this regular expression?

Reply via email to