I did not try but I assume the builtin python module HTMLParser already handle at least (1) tags like <input />, not sure about (2) and (3).
On May24, 4:32am, mdipierro <mdipie...@cs.depaul.edu> wrote: > hmmm.... somehow I did not save comments in the file. > > This does not handle well: > > 1) tags like <input /> > 2) attributes that contain > in quotes <a onclick="if(a>b)alert()"> > 3) attributes that contain escaped quotes <a onclick="var a=\"x\""> > > On May 23, 10:46 am, Massimo Di Pierro <mdipie...@cs.depaul.edu> > wrote: > > > > > Anybody interested in helping with this? > > > It scrapes an html files and converts into a tree hierarchy of web2py > > helpers > > > '<div>xxx</div>' -> DIV('xxx') > > > It kind of works but fails at three exceptions described in the file. > > > Massimo > > > parsehtml.py > > 1KViewDownload