On Tue, Jul 30, 2013 at 8:09 PM, <wxjmfa...@gmail.com> wrote: > Matable, immutable, copyint + xxx, bufferint, O(n) .... > Yes, but conceptualy the reencoding happen sometime, somewhere. > The internal "ucs-2" will never automagically be transformed > into "ucs-4" (eg).
But probably not on the entire document. With even a brainless scheme like I posted code for, no more than 1024 bytes will need to be recoded at a time (except in some odd edge cases, and even then, no more than once for any given file). > And do not forget, in a pure utf coding scheme, your > char or a char will *never* be larger than 4 bytes. > >>>> sys.getsizeof('a') > 26 >>>> sys.getsizeof('\U000101000') > 48 Yeah, you have a few odd issues like, oh, I dunno, GC overhead, reference count, object class, and string length, all stored somewhere there. Honestly jmf, if you want raw assembly you know where to get it. ChrisA -- http://mail.python.org/mailman/listinfo/python-list