[web2py] Re: parsehtml

2010-06-08 Thread mdipierro
posting... On Jun 8, 2:30 am, Iceberg wrote: > Hi Massimo, did you include it in your latest commit? If so, I guess > you forget to add a new gluon/decoder.py as well. > > And a side note, I recommend you maintain two web2py environment for > yourself, one for developing and then commit new featu

[web2py] Re: parsehtml

2010-06-08 Thread Iceberg
Hi Massimo, did you include it in your latest commit? If so, I guess you forget to add a new gluon/decoder.py as well. And a side note, I recommend you maintain two web2py environment for yourself, one for developing and then commit new features, the other is used for "hg pull" and "hg update" and

[web2py] Re: parsehtml

2010-06-07 Thread mdipierro
I will include this in web2py. Thank you On Jun 7, 5:12 pm, Alexandre Andrade wrote: > Try also: > > http://code.activestate.com/recipes/52257/ > > 2010/6/7 mdipierro > > > Amazing. Very similar. One thing that web2py TAG is missing it the > > ability to guess encoding. It fails and.or does mist

Re: [web2py] Re: parsehtml

2010-06-07 Thread Alexandre Andrade
Try also: http://code.activestate.com/recipes/52257/ 2010/6/7 mdipierro > Amazing. Very similar. One thing that web2py TAG is missing it the > ability to guess encoding. It fails and.or does mistakes if the source > is not UTF8 encoded. > > On Jun 6, 10:57 pm, Álvaro Justen wrote: > > This pro

Re: [web2py] Re: parsehtml

2010-06-07 Thread Alexandre Andrade
see: http://chardet.feedparser.org/ 2010/6/7 mdipierro > Amazing. Very similar. One thing that web2py TAG is missing it the > ability to guess encoding. It fails and.or does mistakes if the source > is not UTF8 encoded. > > On Jun 6, 10:57 pm, Álvaro Justen wrote: > > This project:http://githu

[web2py] Re: parsehtml

2010-06-06 Thread mdipierro
Amazing. Very similar. One thing that web2py TAG is missing it the ability to guess encoding. It fails and.or does mistakes if the source is not UTF8 encoded. On Jun 6, 10:57 pm, Álvaro Justen wrote: > This project:http://github.com/gabrielfalcao/dominic#readme > was created by a Brazilian. > May

Re: [web2py] Re: parsehtml

2010-06-06 Thread Álvaro Justen
This project: http://github.com/gabrielfalcao/dominic#readme was created by a Brazilian. Maybe it can helps with web2py HTMLParser. -- Álvaro Justen - Turicas http://blog.justen.eng.br/ 21 9898-0141

[web2py] Re: parsehtml

2010-05-25 Thread mdipierro
I changed the syntax. Now it is more flexible: >>> a=TAG('Headerthis is a test') >>> def markdown(text,tag=None,attributes={}): >>> if tag==None: return re.sub('\s+',' ',text) >>> elif tag=='h1': return '#'+text+'\n\n' >>> elif tag=='p': return text+'\n' >>> return text ... >>>

[web2py] Re: parsehtml

2010-05-25 Thread Iceberg
On May26, 12:35am, mdipierro wrote: > I cannot push it until tonight but I have this: > > >>> a=TAG('Headerthis is a     test') > >>> print a > > Headerthis is a test>>> a.flatten() > > 'Headerthis is a     test'>>> a.flatten(filter=lambda x: re.sub('\s+',' ',x)) > > 'Headerthis is a test'>>> a.

[web2py] Re: parsehtml

2010-05-25 Thread mdipierro
I cannot push it until tonight but I have this: >>> a=TAG('Headerthis is a test') >>> print a Headerthis is a test >>> a.flatten() 'Headerthis is a test' >>> a.flatten(filter=lambda x: re.sub('\s+',' ',x)) 'Headerthis is a test' >>> a.flatten(filter=lambda x: re.sub('\s+','-',x)) 'Headerth

[web2py] Re: parsehtml

2010-05-25 Thread mdipierro
Working on that and the possibility to parse into markdown. > Well, not exactly an html optimizer, because our version does not > strip spaces inside text content. Just for fun.

[web2py] Re: parsehtml

2010-05-25 Thread Iceberg
Hi Massimo, Good to know you finally made it! :-) Albeit not knowing where and when to use this new feature, I came up with an HTML Optimizier such as [1], in a dozen lines of web2py code. [1] http://www.iwebtool.com/html_optimizer [2] Put this inside your controller. def easter(): # This code

[web2py] Re: parsehtml

2010-05-24 Thread mdipierro
Good suggestion. Now you can do >>> from gluon.html import web2pyHTMLParser >>> tree = web2pyHTMLParser('helloworld').tree >>> tree.element(_a='b') ['_c']=5 >>> str(tree) 'helloworld' works great! On May 24, 5:11 am, Iceberg wrote: > I did not try but I assume the builtin

[web2py] Re: parsehtml

2010-05-24 Thread Iceberg
I did not try but I assume the builtin python module HTMLParser already handle at least (1) tags like , not sure about (2) and (3). On May24, 4:32am, mdipierro wrote: > hmmm somehow I did not save comments in the file. > > This does not handle well: > > 1) tags like > 2) attributes that cont

[web2py] Re: parsehtml

2010-05-23 Thread mdipierro
hmmm somehow I did not save comments in the file. This does not handle well: 1) tags like 2) attributes that contain > in quotes 3) attributes that contain escaped quotes On May 23, 10:46 am, Massimo Di Pierro wrote: > Anybody interested in helping with this? > > It scrapes an html files

[web2py] Re: parsehtml

2010-05-23 Thread mdipierro
Nothing goes wrong in the example but try with a different input string and you will get wrong parsing. This can be using for screen scraping for example without need to install BeautifulSoup, just using existing web2py functionality. On May 23, 1:37 pm, Iceberg wrote: > I ran the code and nothi

[web2py] Re: parsehtml

2010-05-23 Thread Iceberg
I ran the code and nothing goes wrong, no exception either. By the way, what is the user case of this HTMLParser? Regards, Iceberg On May23, 11:46pm, Massimo Di Pierro wrote: > Anybody interested in helping with this? > > It scrapes an html files and converts into a tree hierarchy of web2py >