On Thu, Dec 22, 2011 at 10:58 AM, Chris Angelico <ros...@gmail.com> wrote:
> Firstly, are you using Python 2 or Python 3? Things will be slightly > different, since the default 'str' object in Py3 is Unicode. > 2 > > I would guess that your page is being output as UTF-8; you may find > that the solution is as easy as declaring the encoding of your text > file when you read it in. > So I tried this: file = open(p + "2.txt") for line in file: print unicode(line, 'utf-8') and got this error: 142 print unicode(line, 'utf-8') 143 144 print '''<br /><br /><form id="signup" action=" http://13gems.com/Sign_Up.py" method="post" target="_blank"> *builtin* *unicode* = <type 'unicode'>, *line* = '<span class="text">\r\n' /usr/lib64/python2.4/encodings/utf_8.py<file:///usr/lib64/python2.4/encodings/utf_16.py>in *decode*(input=<read-only buffer ptr 0x2b197e378454, size 21>, errors='strict') 14 15 def decode(input, errors='strict'): 16 return codecs.utf_16_decode(input, errors, True) 17 18 class StreamWriter(codecs.StreamWriter): *global* *codecs* = <module 'codecs' from '/usr/lib64/python2.4/codecs.pyc'>, codecs.*utf_16_decode* = <built-in function utf_16_decode>, *input* = <read-only buffer ptr 0x2b197e378454, size 21>, *errors* = 'strict', *builtin* *True* = True *UnicodeDecodeError*: 'utf16' codec can't decode byte 0x0a in position 20: truncated data args = ('utf16', '<span class="text">\r\n', 20, 21, 'truncated data') encoding = 'utf16' end = 21 object = '<span class="text">\r\n' reason = 'truncated data' start = 20 Tried it with utf-16 with same results. TIA, Stan
-- http://mail.python.org/mailman/listinfo/python-list