The entire code is 40 lines and uses the python built-in html parser.
It will not be a problem to maintain it. Actually we could even use
this simplify both XML(...,sanitize) and gluon.contrib.markdown.WIKI

On May 25, 12:50 am, Thadeus Burgess <thade...@thadeusb.com> wrote:
> > So why our own?
>
> Because it converts it into web2py helpers.
>
> And you don't have to deal with installing anything other than web2py.
>
> --
> Thadeus
>
> On Tue, May 25, 2010 at 12:14 AM, Kevin Bowling <kevin.bowl...@gmail.com> 
> wrote:
> > Hmm, I wonder if this is worth the possible maintenance cost?  It also
> > transcends the role of a web framework and now you are getting into
> > network programming.
>
> > I have a currently deployed screen scraping app and found PyQuery to
> > be more than adequate.  There is also lxml directly, or Beautiful
> > Soup.  A simple import away and they integrate with web2py or anything
> > else just fine.  So why our own?
>
> > Regards,
> > Kevin
>
> > On May 24, 9:35 pm, mdipierro <mdipie...@cs.depaul.edu> wrote:
> >> New in trunk. Screen scraping capabilities.
>
> >> Example:>>> import re
> >> >>> from gluon.html import web2pyHTMLParser
> >> >>> from urllib import urlopen
> >> >>> html=urlopen('http://nobelprize.org/nobel_prizes/physics/laureates/1921/einstein-bi...()
> >> >>> tree=web2pyHTMLParser(html).tree  ### NEW!!
> >> >>> elements=tree.elements('div') # search by tag type
> >> >>> elements=tree.elements(_id="Einstein") # search by attribute value (id 
> >> >>> for example)
> >> >>> elements=tree.elements(find='Einstein') # text search NEW!!
> >> >>> elements=tree.elements(find=re.compile('Einstein')) # search via regex 
> >> >>> NEW!!
> >> >>> print elements[0]
>
> >> <title>Albert Einstein - Biography</title>>>> print elements[0][0]
>
> >> Albert Einstein - Biography>>> elements[0].append(SPAN(' modified'))
>
> >> <title>Albert Einstein - Biography<span>modified</span></title>>>> print 
> >> tree
>
> >> <html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml";>
> >> <head>
> >>   <title>Albert Einstein - Biography<span>modified<span></title>
> >> ...

Reply via email to