there are docstrings. I will write something more asap.
On May 25, 10:28 pm, weheh wrote:
> This is very nice. I think Thadeus' point is well made. I agree it's
> useful. It is fringe, but I absolutely need this and will be using it
> on my current project. Where's the doc?
This is very nice. I think Thadeus' point is well made. I agree it's
useful. It is fringe, but I absolutely need this and will be using it
on my current project. Where's the doc?
On Tue, May 25, 2010 at 12:11, mdipierro wrote:
> Here is a one liner to remove all tags from a some html text:
>
html = 'helloworld'
print TAG(html).flatten()
> helloworld
Very good!
--
Álvaro Justen - Turicas
http://blog.justen.eng.br/
21 9898-0141
It makes assumptions. It fails if Python HTMLParser fails. For
example:
>>> from gluon.html import TAG
>>> print TAG('aaabbbdddeee')
aaabbbdddeee
>>> print TAG('aaabbbdddeee')
aaabbbdddeee
>>> print TAG('aaadddeee')
Traceback (most recent call last):
HTMLParser.HTMLParseError:
how robust have you found HTMLParser with badly formed HTML?
On May 26, 1:11 am, mdipierro wrote:
> Here is a one liner to remove all tags from a some html text:
>
> >>> html = 'helloworld'
> >>> print TAG(html).flatten()
>
> helloworld
>
> On May 25, 10:02 am, mdipierro wrote:
>
> > yet a bett
Was going to say "web2pyHTMLParser" is too cumbersome - glad you
changed to "TAG"
I do some scraping with lxml so am also wary about including this, but
the example look very convenient.
On May 26, 1:11 am, mdipierro wrote:
> Here is a one liner to remove all tags from a some html text:
>
> >>
Here is a one liner to remove all tags from a some html text:
>>> html = 'helloworld'
>>> print TAG(html).flatten()
helloworld
On May 25, 10:02 am, mdipierro wrote:
> yet a better syntax and more API:
>
> 1) no more web2pyHTMLParser, use TAG(...) instead. and flatten (remove
> tags)
>
> >>> a=TA
yet a better syntax and more API:
1) no more web2pyHTMLParser, use TAG(...) instead. and flatten (remove
tags)
>>> a=TAG('Helloworld')
>>> print a
Helloworld
>>> print a.element('span')
world
>>> print a.flatten()
Helloworld
2) search by multiple conditions, including regex
for example, find all
The entire code is 40 lines and uses the python built-in html parser.
It will not be a problem to maintain it. Actually we could even use
this simplify both XML(...,sanitize) and gluon.contrib.markdown.WIKI
On May 25, 12:50 am, Thadeus Burgess wrote:
> > So why our own?
>
> Because it converts it
> So why our own?
Because it converts it into web2py helpers.
And you don't have to deal with installing anything other than web2py.
--
Thadeus
On Tue, May 25, 2010 at 12:14 AM, Kevin Bowling wrote:
> Hmm, I wonder if this is worth the possible maintenance cost? It also
> transcends the r
Hmm, I wonder if this is worth the possible maintenance cost? It also
transcends the role of a web framework and now you are getting into
network programming.
I have a currently deployed screen scraping app and found PyQuery to
be more than adequate. There is also lxml directly, or Beautiful
Sou
11 matches
Mail list logo