The entire code is 40 lines and uses the python built-in html parser. It will not be a problem to maintain it. Actually we could even use this simplify both XML(...,sanitize) and gluon.contrib.markdown.WIKI
On May 25, 12:50 am, Thadeus Burgess <thade...@thadeusb.com> wrote: > > So why our own? > > Because it converts it into web2py helpers. > > And you don't have to deal with installing anything other than web2py. > > -- > Thadeus > > On Tue, May 25, 2010 at 12:14 AM, Kevin Bowling <kevin.bowl...@gmail.com> > wrote: > > Hmm, I wonder if this is worth the possible maintenance cost? It also > > transcends the role of a web framework and now you are getting into > > network programming. > > > I have a currently deployed screen scraping app and found PyQuery to > > be more than adequate. There is also lxml directly, or Beautiful > > Soup. A simple import away and they integrate with web2py or anything > > else just fine. So why our own? > > > Regards, > > Kevin > > > On May 24, 9:35 pm, mdipierro <mdipie...@cs.depaul.edu> wrote: > >> New in trunk. Screen scraping capabilities. > > >> Example:>>> import re > >> >>> from gluon.html import web2pyHTMLParser > >> >>> from urllib import urlopen > >> >>> html=urlopen('http://nobelprize.org/nobel_prizes/physics/laureates/1921/einstein-bi...() > >> >>> tree=web2pyHTMLParser(html).tree ### NEW!! > >> >>> elements=tree.elements('div') # search by tag type > >> >>> elements=tree.elements(_id="Einstein") # search by attribute value (id > >> >>> for example) > >> >>> elements=tree.elements(find='Einstein') # text search NEW!! > >> >>> elements=tree.elements(find=re.compile('Einstein')) # search via regex > >> >>> NEW!! > >> >>> print elements[0] > > >> <title>Albert Einstein - Biography</title>>>> print elements[0][0] > > >> Albert Einstein - Biography>>> elements[0].append(SPAN(' modified')) > > >> <title>Albert Einstein - Biography<span>modified</span></title>>>> print > >> tree > > >> <html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"> > >> <head> > >> <title>Albert Einstein - Biography<span>modified<span></title> > >> ...