Viktor Dukhovni via Postfix-users: > On Fri, Oct 06, 2023 at 06:50:38PM -0400, Wietse Venema via Postfix-users > wrote: > > > + } else { > > + server->username = mystrdup(serverout); > > + printable(server->username, '?'); > > I might note that when UTF8 is enabled, this does correctly leaves valid > UTF8 characters undisturbed. > > However, I also took a close look at printable(), and noticed that it > admits UTF-8 code points with outside the the Unicode range (leader byte > up to 254), while the maximum valid UTF-8 lead byte tops out at 244 > (0b11110100), for characters in the range U+100000?U+10FFFF. > > And also, the number of non-leader bytes is not validated, allowing > for abitrarily long runs of 0b10xxxxxx octets after the leader byte. > > I think we can do better:
Indeed. But we already have valid_utf8_string(), I'll see if we can reuse more code between valid_utf8_string() and printable(). Wietse > - Limit the leader byte to 244 (which still admits some high code > points up to U+13FFFF). > > - When the leader byte is followed by too few non-leader bytes, > resynchronise by replacing the leader byte with '?' and starting > at the next byte. This produces more sensible results. > > - When too many non-leader bytes follow, they're also replaced with > '?'. > > I put together a patch, and small number of test code points, which > include some malformed UTF-8, so not easy to post. In git, I tagged the > input file as "binary" for "git commit" purposes. > > I doubt "patch" supports the binary file, so I also attached > "printable.in" separately. > > -- > Viktor. [ Attachment, skipping... ] [ Attachment, skipping... ] > _______________________________________________ > Postfix-users mailing list -- postfix-users@postfix.org > To unsubscribe send an email to postfix-users-le...@postfix.org _______________________________________________ Postfix-users mailing list -- postfix-users@postfix.org To unsubscribe send an email to postfix-users-le...@postfix.org