nntplib encoding problem

2011-02-27 Thread Laurent Duchesne

Hi,

I'm using python 3.2 and got the following error:


nntpClient = nntplib.NNTP_SSL(...)
nntpClient.group("alt.binaries.cd.lossless")
nntpClient.over((534157,534157))
... 'subject': 'Myl\udce8ne Farmer - Anamorphosee (Japan Edition) 1995 
[02/41] "Back.jpg" yEnc (1/3)' ...

overview = nntpClient.over((534157,534157))
print(overview[1][0][1]['subject'])

Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce8' in 
position 3: surrogates not allowed


I'm not sure if I should report this as a bug in nntplib or if I'm 
doing something wrong.


Note that I get the same error if I try to write this data to a file:


h = open("output.txt", "a")
h.write(overview[1][0][1]['subject'])

Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce8' in 
position 3: surrogates not allowed


Thanks,
Laurent
--
http://mail.python.org/mailman/listinfo/python-list


Re: nntplib encoding problem

2011-02-28 Thread Laurent Duchesne

Hi,

Thanks it's working!
But is it "normal" for a string coming out of a module (nntplib) to 
crash when passed to print or write?


I'm just asking to know if I should open a bug report or not :)

I'm also wondering which strings should be re-encoded using the 
surrogateescape parameter and which should not.. I guess I could 
reencode them all and it wouldn't cause any problems?


Laurent

On Mon, 28 Feb 2011 02:12:20 +, MRAB wrote:

On 28/02/2011 01:31, Laurent Duchesne wrote:

Hi,

I'm using python 3.2 and got the following error:


nntpClient = nntplib.NNTP_SSL(...)
nntpClient.group("alt.binaries.cd.lossless")
nntpClient.over((534157,534157))
... 'subject': 'Myl\udce8ne Farmer - Anamorphosee (Japan Edition) 
1995

[02/41] "Back.jpg" yEnc (1/3)' ...

overview = nntpClient.over((534157,534157))
print(overview[1][0][1]['subject'])

Traceback (most recent call last):
File "", line 1, in 
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce8' in
position 3: surrogates not allowed

I'm not sure if I should report this as a bug in nntplib or if I'm 
doing

something wrong.

Note that I get the same error if I try to write this data to a 
file:



h = open("output.txt", "a")
h.write(overview[1][0][1]['subject'])

Traceback (most recent call last):
File "", line 1, in 
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce8' in
position 3: surrogates not allowed


It's looks like the subject was originally encoded as Latin-1 (or
similar) (b'Myl\xe8ne Farmer - Anamorphosee (Japan Edition) 1995
[02/41] "Back.jpg" yEnc (1/3)') but has been decoded as UTF-8 with
"surrogateescape" passed as the "errors" parameter.

You can get the "correct" Unicode by encoding as UTF-8 with
"surrogateescape" and then decoding as Latin-1:

overview[1][0][1]['subject'].encode("utf-8",
"surrogateescape").decode("latin-1")


--
http://mail.python.org/mailman/listinfo/python-list