Hi, thanks for the reply.

However, I get strange behavior when I try to feed text that must be unicode to altavista for translation.
Just before sending, I've got the following on the log using

print "RECV DATA: ", repr(data)

and after entering "então" ("so" in Portuguese)

RECV DATA:  'right: ent\xc3\xa3o?'
Sent Message to Client Nr.  1
CONTENT:  ['right', ' ent\xc3\xa3o?']

Above before the CONTENT printout, there is a data.split(":")

Now right before sending the data to be translated by altavista I print out from the CONTENT[1] which yields:

Translating:   então?

Which I find odd. Obvisouly, feeding this into babelfish results in a failed translation. So before sending I try to encode it like you suggest.

try:
 print "Translating: ", content[1]
 decoded = content[1].encode('utf8')
 print "Decoding Prior to Translating: ", decoded
except Exception, e:
 print "EXCEPTION ENCODING ", e

try:
 translated = translate(decoded, src_l, dest_l)
except Exception, e:
 print "EXCEPTION TRANSLATING ", e
 translated = "translation failed"


The Exception thrown is:

EXCEPTION ENCODING 'ascii' codec can't decode byte 0xc3 in position 4: ordinal
not in range(128)


I was dealing w/ a Ascii string and was asking it to be encoded in UTF, whereas Python is telling me he can't encode it in UTF?? Makes little sense to me.

Chrs
j.


From: Kent Johnson <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
CC: tutor@python.org
Subject: Re: [Tutor] i18n on Entry widgets
Date: Wed, 17 Aug 2005 13:27:24 -0400

Jorge Louis de Castro wrote:
Hi,

How do I set the encoding of a string? I'm reading a string on a Entry widget and it may use accents and other special characters from languages other than English. When I send the string read through a socket the socket is automatically closed. Is there a way to encode any special characters on a string?

First you have to know what the encoding is of the string you get from the Entry. IIRC a Tkinter widget will give you an ASCII string if possible, otherwise a Unicode string. You could check this by
 print repr(data)
where data is the string you get from the Entry.

Next you have to encode the unicode string to the encoding you want on the socket. If you want utf-8, you would use
 socket_data = data.encode('utf-8')
This will work if data is ASCII or Unicode. There are many other supported encodings; see http://docs.python.org/lib/standard-encodings.html for a list.

Kent


_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to