posting...
On Jun 8, 2:30 am, Iceberg wrote:
> Hi Massimo, did you include it in your latest commit? If so, I guess
> you forget to add a new gluon/decoder.py as well.
>
> And a side note, I recommend you maintain two web2py environment for
> yourself, one for developing and then commit new featu
Hi Massimo, did you include it in your latest commit? If so, I guess
you forget to add a new gluon/decoder.py as well.
And a side note, I recommend you maintain two web2py environment for
yourself, one for developing and then commit new features, the other
is used for "hg pull" and "hg update" and
I will include this in web2py. Thank you
On Jun 7, 5:12 pm, Alexandre Andrade wrote:
> Try also:
>
> http://code.activestate.com/recipes/52257/
>
> 2010/6/7 mdipierro
>
> > Amazing. Very similar. One thing that web2py TAG is missing it the
> > ability to guess encoding. It fails and.or does mist
Try also:
http://code.activestate.com/recipes/52257/
2010/6/7 mdipierro
> Amazing. Very similar. One thing that web2py TAG is missing it the
> ability to guess encoding. It fails and.or does mistakes if the source
> is not UTF8 encoded.
>
> On Jun 6, 10:57 pm, Álvaro Justen wrote:
> > This pro
see:
http://chardet.feedparser.org/
2010/6/7 mdipierro
> Amazing. Very similar. One thing that web2py TAG is missing it the
> ability to guess encoding. It fails and.or does mistakes if the source
> is not UTF8 encoded.
>
> On Jun 6, 10:57 pm, Álvaro Justen wrote:
> > This project:http://githu
Amazing. Very similar. One thing that web2py TAG is missing it the
ability to guess encoding. It fails and.or does mistakes if the source
is not UTF8 encoded.
On Jun 6, 10:57 pm, Álvaro Justen wrote:
> This project:http://github.com/gabrielfalcao/dominic#readme
> was created by a Brazilian.
> May
This project:
http://github.com/gabrielfalcao/dominic#readme
was created by a Brazilian.
Maybe it can helps with web2py HTMLParser.
--
Álvaro Justen - Turicas
http://blog.justen.eng.br/
21 9898-0141
I changed the syntax. Now it is more flexible:
>>> a=TAG('Headerthis is a test')
>>> def markdown(text,tag=None,attributes={}):
>>> if tag==None: return re.sub('\s+',' ',text)
>>> elif tag=='h1': return '#'+text+'\n\n'
>>> elif tag=='p': return text+'\n'
>>> return text
...
>>>
On May26, 12:35am, mdipierro wrote:
> I cannot push it until tonight but I have this:
>
> >>> a=TAG('Headerthis is a test')
> >>> print a
>
> Headerthis is a test>>> a.flatten()
>
> 'Headerthis is a test'>>> a.flatten(filter=lambda x: re.sub('\s+',' ',x))
>
> 'Headerthis is a test'>>> a.
I cannot push it until tonight but I have this:
>>> a=TAG('Headerthis is a test')
>>> print a
Headerthis is a test
>>> a.flatten()
'Headerthis is a test'
>>> a.flatten(filter=lambda x: re.sub('\s+',' ',x))
'Headerthis is a test'
>>> a.flatten(filter=lambda x: re.sub('\s+','-',x))
'Headerth
Working on that and the possibility to parse into markdown.
> Well, not exactly an html optimizer, because our version does not
> strip spaces inside text content. Just for fun.
Hi Massimo, Good to know you finally made it! :-)
Albeit not knowing where and when to use this new feature, I came up
with an HTML Optimizier such as [1], in a dozen lines of web2py code.
[1] http://www.iwebtool.com/html_optimizer
[2] Put this inside your controller.
def easter(): # This code
Good suggestion. Now you can do
>>> from gluon.html import web2pyHTMLParser
>>> tree = web2pyHTMLParser('helloworld').tree
>>> tree.element(_a='b')
['_c']=5
>>>
str(tree)
'helloworld'
works great!
On May 24, 5:11 am, Iceberg wrote:
> I did not try but I assume the builtin
I did not try but I assume the builtin python module HTMLParser
already handle at least (1) tags like , not sure about (2)
and (3).
On May24, 4:32am, mdipierro wrote:
> hmmm somehow I did not save comments in the file.
>
> This does not handle well:
>
> 1) tags like
> 2) attributes that cont
hmmm somehow I did not save comments in the file.
This does not handle well:
1) tags like
2) attributes that contain > in quotes
3) attributes that contain escaped quotes
On May 23, 10:46 am, Massimo Di Pierro
wrote:
> Anybody interested in helping with this?
>
> It scrapes an html files
Nothing goes wrong in the example but try with a different input
string and you will get wrong parsing.
This can be using for screen scraping for example without need to
install BeautifulSoup, just using existing web2py functionality.
On May 23, 1:37 pm, Iceberg wrote:
> I ran the code and nothi
I ran the code and nothing goes wrong, no exception either.
By the way, what is the user case of this HTMLParser?
Regards,
Iceberg
On May23, 11:46pm, Massimo Di Pierro wrote:
> Anybody interested in helping with this?
>
> It scrapes an html files and converts into a tree hierarchy of web2py
>
17 matches
Mail list logo