nicolas_riesch wrote: > I just don't understand why it returns the "length consumed". > > Does it means that in some case, the input string can be only partially > converted ?
For an encoder, I believe the answer is "no". For a decoder, it is a definite yes: if the input does not end with a complete character, you may have bytes left at the end which did not get decoded. For an encoder, the same *might* happen if you want to encode half-surrogates into, say, UTF-8; the encoder might refuse to encode the half-surrogate, and wait for the other half. Of course, the current UTF-8 encoder will then just encode the surrogate codepoint as if it was a proper character. If you extend the notion of "encoding", similar things may happen all the time. E.g. a DES encoder may only support multiples of the block size, and leave bytes at the end. > What can be the use of the "length consumed" value ? It's primarily intended for stream writers, which may need to buffer extra characters at the end that did not get encoded, and wait until more input is provided. For all practical purposes, you can ignore the length on encoding. If you are paranoid, assert that it equals the length of the input. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list