In article <515be00e$0$29891$c3e8da3$54964...@news.astraweb.com>, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote:
> On Wed, 03 Apr 2013 18:24:25 +1100, Chris Angelico wrote: > > > On Wed, Apr 3, 2013 at 6:06 PM, Ian Kelly <ian.g.ke...@gmail.com> wrote: > >> On Wed, Apr 3, 2013 at 12:52 AM, Chris Angelico <ros...@gmail.com> > >> wrote: > >>> Hmm. I was about to say "Can you just do a quick collections.Counter() > >>> of the string widths in 3.3, as an easy way of seeing which ones use > >>> BMP or higher characters", but I can't find a simple way to query a > >>> string's width. Can't see it as a method of the string object, nor in > >>> the string or sys modules. It ought to be easy enough at the C level - > >>> just look up the two bits representing 'kind' - but I've not found it > >>> exposed to Python. Is there anything? > >> > >> 4 if max(map(ord, s)) > 0xffff else 2 if max(map(ord, s)) > 0xff else 1 > > > > Yeah, that's iterating over the whole string (twice, if it isn't width > > 4). > > Then don't write it as a one-liner :-P > > n = max(map(ord, s)) > 4 if n > 0xffff else 2 if n > 0xff else 1 This has to inspect the entire string, no? I posted (essentially) this a few days ago: if all(ord(c) <= 0xffff for c in s): return "it's all bmp" else: return "it's got astral crap in it" I'm reasonably sure all() is smart enough to stop at the first False value. > (sys.getsizeof(s) - sys.getsizeof(''))/len(s) > I wouldn't trust getsizeof() to return exactly what you're looking for. -- http://mail.python.org/mailman/listinfo/python-list