On Wed, Apr 3, 2013 at 9:02 AM, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote: > On Wed, 03 Apr 2013 09:43:06 -0400, Roy Smith wrote: > > [...] >>> n = max(map(ord, s)) >>> 4 if n > 0xffff else 2 if n > 0xff else 1 >> >> This has to inspect the entire string, no? > > Correct. A more efficient implementation would be: > > def char_size(s): > for n in map(ord, s): > if n > 0xFFFF: return 4 > if n > 0xFF: return 2 > return 1
That's an incorrect implementation, as it would return 2 at the first non-Latin-1 BMP character, even if there were SMP characters later in the string. It's only safe to short-circuit return 4, not 2 or 1. -- http://mail.python.org/mailman/listinfo/python-list