Paolo Bonzini wrote: > "u8_mbtouc will never access more than N bytes. However, as an > additional guarantee, u8_mbtouc only accesses as many bytes as necessary > to decode the first Unicode character, or to ascertain that S does not > begin with a valid UTF-8 sequence."
This is complicated to understand, because it requires the programmer to understand how a Unicode character is parsed. > > The code may be changed in the future. If a guarantee is not documented AND > > checked by the test suite, you cannot rely on it. > > Of course, that's why I'm suggesting a modification to the specification. What's the use case which would profit from such a guarantee? libunistring supports two string data types: one where the length of the string (number of units) is known, and one which is U+0000 terminated. Are you suggesting that these two data types are not sufficient to cover the users' needs? If your only point is to save a couple of instructions, then's it's a too small benefit, in my opinion. Bruno