On 08/10/2018 09:40 AM, Markus Armbruster wrote:

+            cp = mod_utf8_codepoint(ptr, 6, &end);

Why are you hard-coding 6 here, rather than computing min(6,
strchr(ptr,0)-ptr)?  If the user passes an invalid sequence at the end
of the string, can we end up making mod_utf8_codepoint() read beyond
the end of our string?  Would it be better to just always pass the
remaining string length (mod_utf8_codepoint() only cares about
stopping short of 6 bytes, but never reads beyond there even if you
pass a larger number)?

mod_utf8_codepoint() never reads beyond '\0'.  The second parameter
exists only so you can further limit reads.  I like to provide that
capability, because it sometimes saves a silly substring copy.

Okay. Perhaps the comments on mod_utf8_codepoint() could make that more clear that the contract is not violated (I didn't spot it without a close re-read of the code, prompted by your reply). But that's possibly a separate patch.


+    if (codepoint > 0 && codepoint <= 0x7F) {
+        buf[0] = codepoint & 0x7F;

Dead use of binary &. But acceptable for symmetry with the other code
branches.

Exactly as dead as ...


+    buf[0] = 0xF0 | ((codepoint >> 18) & 0x07);

... even this one.

The last one only because is_valid_codepoint() rejects codepoints >
0x10FFFFu, which is admittedly a non-local argument.

I'm debating whether to keep or drop the redundant masking.  Got a
preference?

No strong preference. A compiler with good range propagation during optimization should be able to eliminate the dead mask from the emitted assembly.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Reply via email to