Eric Blake <ebl...@redhat.com> writes: > On 08/10/2018 09:40 AM, Markus Armbruster wrote: > >>>> + cp = mod_utf8_codepoint(ptr, 6, &end); >>> >>> Why are you hard-coding 6 here, rather than computing min(6, >>> strchr(ptr,0)-ptr)? If the user passes an invalid sequence at the end >>> of the string, can we end up making mod_utf8_codepoint() read beyond >>> the end of our string? Would it be better to just always pass the >>> remaining string length (mod_utf8_codepoint() only cares about >>> stopping short of 6 bytes, but never reads beyond there even if you >>> pass a larger number)? >> >> mod_utf8_codepoint() never reads beyond '\0'. The second parameter >> exists only so you can further limit reads. I like to provide that >> capability, because it sometimes saves a silly substring copy. > > Okay. Perhaps the comments on mod_utf8_codepoint() could make that > more clear that the contract is not violated (I didn't spot it without > a close re-read of the code, prompted by your reply). But that's > possibly a separate patch.
Well, the contract says @s is a string, and that means no access beyond the terminating null character is permitted. Perhaps too subtle. My contracts often are. [...]