Re: file.getvalue() with _ or other characters

2005-03-04 Thread Peter Otten
[EMAIL PROTECTED] wrote: > class mvbHTMLParser(htmllib.HTMLParser): > def __init__(self, formatter, verbose=0): > htmllib.HTMLParser.__init__(self,formatter,verbose) def anchor_end(self): self.anchor = None [...] > then the output is: > text_text > a_link[1] > >

Re: file.getvalue() with _ or other characters

2005-03-04 Thread martijn
srry I needed some sleep. it works oke. But if you want to answer a question. I use this code: -- import StringIO import re import urllib2,htmllib, formatter class mvbHTMLParser(htmllib.HTMLParser): def __init__(self, formatter, verbose

Re: file.getvalue() with _ or other characters

2005-03-03 Thread Peter Hansen
[EMAIL PROTECTED] wrote: I did this but don't work: It is quite possible I misunderstood the problem you were having. I am familiar with a problem with StringIO whereby if you call close() on it, you can no longer call getvalue() afterwards. Perhaps that's not the problem you were seeing. Can you

Re: file.getvalue() with _ or other characters

2005-03-03 Thread Peter Otten
[EMAIL PROTECTED] wrote: > I do this to get a htmlTOtext file [...] > But then the _ characters are away. > is it possible to keep that character in file.getvalue() Just to make sure: you did look into the HTML file and verified that there are actually underscores and not spaces that are _rende

Re: file.getvalue() with _ or other characters

2005-03-03 Thread martijn
mmm I'm a newbie with python. I did this but don't work: class mvbHTMLParser(htmllib.HTMLParser): def __init__(self, formatter, verbose=0): htmllib.HTMLParser.__init__(self,formatter,verbose) self.imglist = [] def handle_image(self,src,alt,*args): self.imglist.ap

Re: file.getvalue() with _ or other characters

2005-03-03 Thread Peter Hansen
[EMAIL PROTECTED] wrote: file = StringIO.StringIO() f = formatter.AbstractFormatter(formatter.DumbWriter(file)) p = mvbHTMLParser(f) p.feed(html) p.close() print file.getvalue() But then the _ characters are away. is it possible to keep that character in file.getvalue() I consider this a defect in

file.getvalue() with _ or other characters

2005-03-03 Thread martijn
H! I do this to get a htmlTOtext file class mvbHTMLParser(htmllib.HTMLParser): def __init__(self, formatter, verbose=0): htmllib.HTMLParser.__init__(self,formatter,verbose) self.imglist = [] def handle_image(self,src,alt,*args): self.imglist.append(src) file =