Paul Rubin wrote: > Leif K-Brooks <[EMAIL PROTECTED]> writes: > > It requires a fairly large change to code and API for a relatively > > uncommon problem. How often do you need to know how many bytes an > > encoded Unicode string takes up without needing the encoded string > > itself? > > Shrug. I don't see a real large change--the code would just check for > an optional arg and process accordingly. I don't know if the issue > comes up often enough to be worth making such accomodations for. I do > know that we had an extensive newsgroup thread about it, from which > this discussion came, but I haven't paid that much attention.
Actually, what Willie was concerned about was some cockamamie DBMS which required to be fed Unicode, which it encoded as UTF-8, but silently truncated if it was more than the n in varchar(n) ... or something like that. So all he needs is a boolean result: u.willitfit(encoding, width) This can of course be optimised with simple early-loop-exit tests: if n_bytes_so_far + n_remaining_uchars > width: return False elif n_bytes_so_far + n_remaining_uchars * M <= width: return True # where M is the maximum #bytes per Unicode char for the encoding that's being used. Tell you what, why don't you and Willie get together and write a PEP? Cheers, John -- http://mail.python.org/mailman/listinfo/python-list