Neil Hodgson wrote: > > The only issue I'm having relates to Unicode. MoinMoin and python are > > pretty unforgiving about files that contain Unicode characters that > > aren't included in the coding properly. I've spent hours reading about > > Unicode, and playing with different encoding/decoding commands, but at > > this point, I just want a hacky solution that will ignore the > > improperly coded characters or replace them with placeholders. > > Call the codec with the errors argument set to "ignore" or "replace". > > >>> unicode('AUTHOR: blahblah\n\nTITLE: Reading Course Readings... G. > A. \x96 For references see blahblah.\n\n\n-----\n\n', 'utf8') > Traceback (most recent call last): > File "<interactive input>", line 1, in ? > File "c:\python24\lib\encodings\utf_8.py", line 16, in decode > return codecs.utf_8_decode(input, errors, True) > UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 58: > unexpected code byte > >>> unicode('AUTHOR: blahblah\n\nTITLE: Reading Course Readings... G. > A. \x96 For references see blahblah.\n\n\n-----\n\n', 'utf8', 'replace') > u'AUTHOR: blahblah\n\nTITLE: Reading Course Readings... G. A. \ufffd For > references see blahblah.\n\n\n-----\n\n' > > BTW, its probably in Windows-1252 where it would be a dash. > Depending on your context it may pay to handle the exception instead of > using "replace" and attempt interpreting as Windows-1252.
here's one way to explicitly deal with 1252 gremlins: http://effbot.org/zone/unicode-gremlins.htm </F> -- http://mail.python.org/mailman/listinfo/python-list