I cannot push it until tonight but I have this: >>> a=TAG('<h1>Header</h1><p>this is a test</p>') >>> print a <h1>Header</h1><p>this is a test</p> >>> a.flatten() 'Headerthis is a test' >>> a.flatten(filter=lambda x: re.sub('\s+',' ',x)) 'Headerthis is a test' >>> a.flatten(filter=lambda x: re.sub('\s+','-',x)) 'Headerthis-is-a-test' >>> a.flatten(render=dict(h1=lambda x: '#'+x+'\n\n'),filter=lambda x: >>> x.replace(' ','-')) '#Header\n\nthis-is-a-test'
filter is applied to text and render is applier to tags. so your result = web2pyHTMLParser(form.vars.input).tree could be written as result = TAG(form.vars.input).flatten(filter=lambda x: re.sub('\s +',' ',x)), render=dict(br=lambda x:'\n',p=lambda x: x+'\n')) Can somebody propose better names for "filter" ad "render"? I could not come up with anything better. Massimo On May 25, 10:24 am, Iceberg <iceb...@21cn.com> wrote: > Hi Massimo, Good to know you finally made it! :-) > > Albeit not knowing where and when to use this new feature, I came up > with an HTML Optimizier such as [1], in a dozen lines of web2py code. > > [1]http://www.iwebtool.com/html_optimizer > > [2] Put this inside your controller. > > def easter(): # This code release in public domain > from gluon.html import web2pyHTMLParser > form = FORM( > TEXTAREA(_name='input'), BR(), > INPUT(_type='submit', _value='Optimize!'), ) > result = '' > if form.accepts(request.vars, keepvalues=True): > result = web2pyHTMLParser(form.vars.input).tree > return {'':DIV( > 'Insert your HTML code to optimize:', > form, > FIELDSET(PRE(str(result))),)} > > Well, not exactly an html optimizer, because our version does not > strip spaces inside text content. Just for fun. > > Regards, > Iceberg > > On May25, 4:27am, mdipierro <mdipie...@cs.depaul.edu> wrote: > > > Good suggestion. Now you can do > > > >>> from gluon.html import web2pyHTMLParser > > >>> tree = web2pyHTMLParser('hello<div a="b">world</ > > div>').tree > > >>> tree.element(_a='b') > > ['_c']=5 > > >>> > > str(tree) > > 'hello<div a="b" c="5">world</div>' > > > works great! > > > On May 24, 5:11 am, Iceberg <iceb...@21cn.com> wrote: > > > > I did not try but I assume the builtin python module HTMLParser > > > already handle at least (1) tags like <input />, not sure about (2) > > > and (3). > > > > On May24, 4:32am, mdipierro <mdipie...@cs.depaul.edu> wrote: > > > > > hmmm.... somehow I did not save comments in the file. > > > > > This does not handle well: > > > > > 1) tags like <input /> > > > > 2) attributes that contain > in quotes <a onclick="if(a>b)alert()"> > > > > 3) attributes that contain escaped quotes <a onclick="var a=\"x\""> > > > > > On May 23, 10:46 am, Massimo Di Pierro <mdipie...@cs.depaul.edu> > > > > wrote: > > > > > > Anybody interested in helping with this? > > > > > > It scrapes an html files and converts into a tree hierarchy of web2py > > > > > > > > > > helpers > > > > > > '<div>xxx</div>' -> DIV('xxx') > > > > > > It kind of works but fails at three exceptions described in the file. > > > > > > Massimo > > > > > > parsehtml.py > > > > > 1KViewDownload