Re: Could you verify this, Oh Great Unicode Experts of the Python-List?

Chris Angelico Sun, 11 Aug 2013 02:04:48 -0700

On Sun, Aug 11, 2013 at 7:17 AM, Joshua Landau <[email protected]> wrote:
> Given tweet = b"caf\x65\xCC\x81".decode():
>
>     >>> tweet
>     'café'
>
> But:
>
>     >>> len(tweet)
>     5


You're now looking at the difference between glyphs and combining
characters. Twitter counts combining characters, so when you build one
"thing" out of lots of separately-typed parts, it does count as more
characters.

Read this article for some arguments on the subject, including a
number of references to Twitter itself:

http://unspecified.wordpress.com/2012/04/19/the-importance-of-language-level-abstract-unicode-strings/

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Could you verify this, Oh Great Unicode Experts of the Python-List?

Reply via email to