Timofei Zhakov <[email protected]> writes: > Committed the test in r1934405 and the rest of the patch in r1934407. > Removed svn_utf__cstring_is_valid() check in r1934408.
Thanks for the change! I like the idea, and the implementation looks good to me overall. I'm a bit late to the party, but here are a few minor comments and suggestions regarding the code: + /* Convert the UTF-8 string to UTF-32 (UCS4) which is the format + * utf8proc_charwidth() expects, and get the width of each character. + * We don't need much error checking since the input is valid UTF-8. */ + while (*cstr) After r1934408, the comment stating that "the input is valid UTF-8" seems outdated, because we no longer check for valid UTF-8 in advance. + int nbytes = utf8proc_iterate((apr_byte_t*)cstr, -1, &ucs); Since `cstr` is a `const char *`, I think this cast unnecessarily discards constness. Also, because utf8proc_iterate() expects a `const utf8proc_uint8_t *`, I'd say that casting directly to that target type is safer than using `apr_byte_t *`, because the latter could theoretically denote a different underlying type. A couple of similar observations: - utf8proc_iterate() returns a utf8proc_ssize_t. Assigning this to `int nbytes` risks truncation in environments where int is smaller than utf8proc_ssize_t: + int nbytes = utf8proc_iterate((apr_byte_t*)cstr, -1, &ucs); - utf8proc_iterate() puts the codepoint result in a utf8proc_int32_t, whereas we currently type it as an apr_int32_t: + apr_int32_t ucs; While those types are almost certainly compatible, I think that changing it to `utf8proc_int32_t` would guarantee that we use a correct type. It would also ensure that `utf8proc_charwidth(utf8proc_int32_t c)` gets called with an argument of the exactly expected type. Thanks, Evgeny Kotkov

