Grant Edwards <[EMAIL PROTECTED]> wrote:
> First, make it work. Then make it work right. Then worry
> about how fast it is.
> "Premature optimization..."
That could be - but then again, most of the comments I've seen for that
particular issue are for rather old releases.
>> It seems to use a
On 2005-02-26, Paul Rubin wrote:
> Jorgen Grahn <[EMAIL PROTECTED]> writes:
>> You should probably do what some other poster suggested -- download
>> lynx or some other text-only browser and make your code execute it
>> in -dump mode to get the text-formatted html. You'll get that
>> working in an
On 26 Feb 2005 02:36:31 -0800, Paul Rubin <> wrote:
> Jorgen Grahn <[EMAIL PROTECTED]> writes:
>> You should probably do what some other poster suggested -- download
>> lynx or some other text-only browser and make your code execute it
>> in -dump mode to get the text-formatted html. You'll get tha
Michael Spencer <[EMAIL PROTECTED]> writes:
> Mike Meyer wrote:
>
>> It also fails on tags with a ">" in a string in the tag. That's
>> well-formed but ill-used HTML.
>> True enough...however, it doesn't fail too horribly:
> >>> striptags("""the text""")
> "'>the text"
> >>>
De
Jorgen Grahn <[EMAIL PROTECTED]> writes:
> You should probably do what some other poster suggested -- download
> lynx or some other text-only browser and make your code execute it
> in -dump mode to get the text-formatted html. You'll get that
> working in an hour or so, and then you can see if you
gf gf wrote:
Hi. I'm looking for a Python lib to convert HTML to
ASCII.
You might find these threads on comp.lang.python interesting:
http://tinyurl.com/5zmpn
http://tinyurl.com/6mxmb
Kent
--
http://mail.python.org/mailman/listinfo/python-list
Mike Meyer wrote:
It also fails on tags with a ">" in a string in the tag. That's
well-formed but ill-used HTML.
True enough...however, it doesn't fail too horribly:
>>> striptags("""the text""")
"'>the text"
>>>
and I think that case could be rectified rather easily, by stripping an
On Fri, 25 Feb 2005 10:51:47 -0800 (PST), gf gf <[EMAIL PROTECTED]> wrote:
> Hans,
>
> Thanks for the tip. I took a look at Beatiful Soup,
> and it looked like it was a framework to parse HTML.
This is my understanding, too.
> I'm not really interetsed in going through it tag by
> tag - just t
Michael Spencer <[EMAIL PROTECTED]> writes:
> gf gf wrote:
>> [wants to extract ASCII from badly-formed HTML and thinks BeautifulSoup is
>> too complex]
>
> You haven't specified what you mean by "extracting" ASCII, but I'll
> assume that you want to start by eliminating html tags and comments,
>
gf gf wrote:
[wants to extract ASCII from badly-formed HTML and thinks BeautifulSoup is too complex]
You haven't specified what you mean by "extracting" ASCII, but I'll assume that
you want to start by eliminating html tags and comments, which is easy enough
with a couple of regular expressions:
Hans,
Thanks for the tip. I took a look at Beatiful Soup,
and it looked like it was a framework to parse HTML.
I'm not really interetsed in going through it tag by
tag - just to get it converted to ASCII. How can I do
this with B. Soup?
--Thanks
PS William - thanks for the reference to lynx,
Try Beautiful Soup!
1) Be able to handle badly formed, or illegal, HTML,
as best as possible.
From the description:
"It won't choke if you give it ill-formed markup: it'll just give you access to
a correspondingly ill-formed data structure."
Can anyone direct me to something which could help me
gf gf <[EMAIL PROTECTED]> wrote:
> Hi. I'm looking for a Python lib to convert HTML to
> ASCII. Of course, a quick Google search showed
> several options (although, I must say, less than I
> would expect, considering how easy this is to do in
> *other* languages... :| ), but, I have 2 requirement
Hi. I'm looking for a Python lib to convert HTML to
ASCII. Of course, a quick Google search showed
several options (although, I must say, less than I
would expect, considering how easy this is to do in
*other* languages... :| ), but, I have 2 requirements,
which none of them seem to meet:
1) Be
14 matches
Mail list logo