I use lxml.html. Just as good, and MUCH faster. A pain to install though. On Tue, Oct 20, 2009 at 6:32 PM, Anand Balachandran Pillai < abpil...@gmail.com> wrote:
> > > On Thu, Sep 10, 2009 at 7:44 PM, Puneet Aggarwal <look4pun...@gmail.com>wrote: > >> Thanks all for the suggestions. I think I will start with BeautifulSoup >> (3.0.7a) and will experiment with other suggested libs if it does not fit >> into my requirement or if I face issues with this. >> > > You are not going to believe this, but the creator of BeautifulSoup > (Leonardo) > advised me to use the SGMLParser module in Python for parsing HTML. This > was back in 2004 (or 2005) when I had written to him regarding > BeautifulSoup > as parser in HarvestMan. He advised me to derive a wrapper from SGMLParser > and thats what I did. > > In case you are interested, you can check out the HTML parser used in > HarvestMan. > It is available at, > > > http://harvestman-crawler.googlecode.com/svn/trunk/HarvestMan/harvestman/lib/pageparser.py > > > >> >> On Thu, Sep 10, 2009 at 7:07 PM, Baishampayan Ghose <b.gh...@gmail.com>wrote: >> >>> > Can anyone suggest me a good library for html parsing in python ? >>> > I googled a found few libararies BeautifulSoup, HTMLParser, SGMLParser >>> etc. >>> > >>> > Can anyone suggest me which should I go for from your experience. >>> >>> BeautifulSoup was OK, but now it's broken. Use lxml, it's very good. >>> >>> http://codespeak.net/lxml/ >>> >>> Regards, >>> BG >>> >>> >>> -- >>> Baishampayan Ghose >>> b.ghose at gmail.com >>> _______________________________________________ >>> BangPypers mailing list >>> BangPypers@python.org >>> http://mail.python.org/mailman/listinfo/bangpypers >>> >> >> >> _______________________________________________ >> BangPypers mailing list >> BangPypers@python.org >> http://mail.python.org/mailman/listinfo/bangpypers >> >> > > > -- > --Anand > > > > > _______________________________________________ > BangPypers mailing list > BangPypers@python.org > http://mail.python.org/mailman/listinfo/bangpypers > > -- Yuvi Panda T http://yuvisense.net
_______________________________________________ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers