[web2py] Re: parsehtml

Iceberg Mon, 24 May 2010 03:11:38 -0700

I did not try but I assume the builtin python module HTMLParser
already handle at least (1) tags like <input />, not sure about (2)
and (3).


On May24, 4:32am, mdipierro <mdipie...@cs.depaul.edu> wrote:
> hmmm.... somehow I did not save comments in the file.
>
> This does not handle well:
>
> 1) tags like <input />
> 2) attributes that contain > in quotes <a onclick="if(a>b)alert()">
> 3) attributes that contain escaped quotes <a onclick="var a=\"x\"">
>
> On May 23, 10:46 am, Massimo Di Pierro <mdipie...@cs.depaul.edu>
> wrote:
>
>
>
> > Anybody interested in helping with this?
>
> > It scrapes an html files and converts into a tree hierarchy of web2py  
> > helpers
>
> > '<div>xxx</div>' -> DIV('xxx')
>
> > It kind of works but fails at three exceptions described in the file.
>
> > Massimo
>
> >  parsehtml.py
> > 1KViewDownload

[web2py] Re: parsehtml

Reply via email to