On Sun, Aug 11, 2013 at 7:17 AM, Joshua Landau <jos...@landau.ws> wrote: > Given tweet = b"caf\x65\xCC\x81".decode(): > > >>> tweet > 'café' > > But: > > >>> len(tweet) > 5
You're now looking at the difference between glyphs and combining characters. Twitter counts combining characters, so when you build one "thing" out of lots of separately-typed parts, it does count as more characters. Read this article for some arguments on the subject, including a number of references to Twitter itself: http://unspecified.wordpress.com/2012/04/19/the-importance-of-language-level-abstract-unicode-strings/ ChrisA -- http://mail.python.org/mailman/listinfo/python-list