Re: How to print first(national) char from unicode string encoded inutf-8?

Mark Tolonen Mon, 01 Sep 2008 21:11:50 -0700

"Marco Bizzarri" <[EMAIL PROTECTED]> wrote in messagenews:[EMAIL PROTECTED]

On Mon, Sep 1, 2008 at 3:25 PM,  <[EMAIL PROTECTED]> wrote:


When I do ${urllib.unquote(c.user.firstName)} without encoding to
latin-1 I got different chars than I will get: no Łukasz but Å ukasz
--
http://mail.python.org/mailman/listinfo/python-list


That's crazy. "string".encode('latin1') gives you a latin1 encoded
string; latin1 is a single byte encoding, therefore taking the first
byte should be no problem.

Have you tried:

urlib.unquote(c.user.firstName)[0].encode('latin1') or

urlib.unquote(c.user.firstName)[0].encode('utf8')

I'm assuming here that the urlib.unquote(c.user.firstName) returns an
encodable string (which I'm absolutely not sure), but if it does, this
should take the first 'character'.

The OP stated that the original string was "encoded in UTF-8 andurllib.quote()", so after urllib.unquote the string is in UTF-8 format.This must be decoded into a Unicode string before removing the firstcharacter:


   urllib.unquote(c.user.firstName).decode('utf-8')[0]

The next problem is that the character in the OP's example string 'Ł' is notpresent in the latin-1 encoding, but using utf-8 encoding demonstrates thatthe full two-byte UTF-8 encoded character is collected:


   >>> import urllib
   >>> name = urllib.quote(u'Łukasz'.encode('utf-8'))
   >>> name
   '%C5%81ukasz'
   >>> urllib.unquote(name).decode('utf-8')[0].encode('utf-8')
   '\xc5\x81'

-Mark

--
http://mail.python.org/mailman/listinfo/python-list

Re: How to print first(national) char from unicode string encoded inutf-8?

Reply via email to