[EMAIL PROTECTED] wrote:
> class mvbHTMLParser(htmllib.HTMLParser):
> def __init__(self, formatter, verbose=0):
> htmllib.HTMLParser.__init__(self,formatter,verbose)
def anchor_end(self):
self.anchor = None
[...]
> then the output is:
> text_text
> a_link[1]
>
>
srry I needed some sleep.
it works oke.
But if you want to answer a question.
I use this code:
--
import StringIO
import re
import urllib2,htmllib, formatter
class mvbHTMLParser(htmllib.HTMLParser):
def __init__(self, formatter, verbose
[EMAIL PROTECTED] wrote:
I did this but don't work:
It is quite possible I misunderstood the problem you
were having. I am familiar with a problem with StringIO
whereby if you call close() on it, you can no longer call
getvalue() afterwards. Perhaps that's not the problem
you were seeing.
Can you
[EMAIL PROTECTED] wrote:
> I do this to get a htmlTOtext file
[...]
> But then the _ characters are away.
> is it possible to keep that character in file.getvalue()
Just to make sure: you did look into the HTML file and verified that there
are actually underscores and not spaces that are _rende
mmm I'm a newbie with python.
I did this but don't work:
class mvbHTMLParser(htmllib.HTMLParser):
def __init__(self, formatter, verbose=0):
htmllib.HTMLParser.__init__(self,formatter,verbose)
self.imglist = []
def handle_image(self,src,alt,*args):
self.imglist.ap
[EMAIL PROTECTED] wrote:
file = StringIO.StringIO()
f = formatter.AbstractFormatter(formatter.DumbWriter(file))
p = mvbHTMLParser(f)
p.feed(html)
p.close()
print file.getvalue()
But then the _ characters are away.
is it possible to keep that character in file.getvalue()
I consider this a defect in
H!
I do this to get a htmlTOtext file
class mvbHTMLParser(htmllib.HTMLParser):
def __init__(self, formatter, verbose=0):
htmllib.HTMLParser.__init__(self,formatter,verbose)
self.imglist = []
def handle_image(self,src,alt,*args):
self.imglist.append(src)
file =