willie wrote: > Marc 'BlackJack' Rintsch: > > >In <[EMAIL PROTECTED]>, willie wrote: > >> # What's the correct way to get the > >> # byte count of a unicode (UTF-8) string? > >> # I couldn't find a builtin method > >> # and the following is memory inefficient. > > >> ustr = "example\xC2\x9D".decode('UTF-8') > > >> num_chars = len(ustr) # 8 > > >> buf = ustr.encode('UTF-8') > > >> num_bytes = len(buf) # 9 > > >That is the correct way. > > > # Apologies if I'm being dense, but it seems > # unusual that I'd have to make a copy of a > # unicode string, converting it into a byte > # string, before I can determine the size (in bytes) > # of the unicode string. Can someone provide the rational > # for that or correct my misunderstanding? >
You initially asked "What's the correct way to get the byte countof a unicode (UTF-8) string". It appears you meant "How can I find how many bytes there are in the UTF-8 representation of a Unicode string without manifesting the UTF-8 representation?". The answer is, "You can't", and the rationale would have to be that nobody thought of a use case for counting the length of the UTF-8 form but not creating the UTF-8 form. What is your use case? Cheers, John -- http://mail.python.org/mailman/listinfo/python-list