Hello,
I'm puzzled by this test I made while trying to transform a page in html to plain text. Because I cannot send unicode to feed, nor str so how can I do this ?
[EMAIL PROTECTED]:~$ python2.4
.Python 2.4.1c2 (#2, Mar 19 2005, 01:04:19) .[GCC 3.3.5 (Debian 1:3.3.5-12)] on linux2
.Type "help", "copyright", "credits" or "license" for more information.
.>>> import formatter
.>>> import htmllib
.>>> html2txt = htmllib.HTMLParser(formatter.AbstractFormatter(formatter.DumbWriter()))
.>>> html2txt.feed(u'D\xe9but')
.Traceback (most recent call last):
. File "<stdin>", line 1, in ?
. File "/usr/lib/python2.4/sgmllib.py", line 95, in feed
. self.goahead(0)
. File "/usr/lib/python2.4/sgmllib.py", line 120, in goahead
. self.handle_data(rawdata[i:j])
. File "/usr/lib/python2.4/htmllib.py", line 65, in handle_data
. self.formatter.add_flowing_data(data)
. File "/usr/lib/python2.4/formatter.py", line 197, in add_flowing_data
. self.writer.send_flowing_data(data)
. File "/usr/lib/python2.4/formatter.py", line 421, in send_flowing_data
. write(word)
.UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)
.>>> html2txt.feed(u'D\xe9but'.encode('latin1'))
.Traceback (most recent call last):
. File "<stdin>", line 1, in ?
. File "/usr/lib/python2.4/sgmllib.py", line 94, in feed
. self.rawdata = self.rawdata + data
.UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 1: ordinal not in range(128)
.>>> html2txt.feed('Début')
.Traceback (most recent call last):
. File "<stdin>", line 1, in ?
. File "/usr/lib/python2.4/sgmllib.py", line 94, in feed
. self.rawdata = self.rawdata + data
.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
.>>>
-- (°> Nicolas Évrard / ) Liège - Belgique ^^ -- http://mail.python.org/mailman/listinfo/python-list