Viktor Dukhovni via Postfix-users:
> On Fri, Oct 06, 2023 at 06:50:38PM -0400, Wietse Venema via Postfix-users
> wrote:
>
> > + } else {
> > + server->username = mystrdup(serverout);
> > + printable(server->username, '?');
>
> I might note that when UTF8 is enabled, this does correctly leaves valid
> UTF8 characters undisturbed.
>
> However, I also took a close look at printable(), and noticed that it
> admits UTF-8 code points with outside the the Unicode range (leader byte
> up to 254), while the maximum valid UTF-8 lead byte tops out at 244
> (0b11110100), for characters in the range U+100000?U+10FFFF.
>
> And also, the number of non-leader bytes is not validated, allowing
> for abitrarily long runs of 0b10xxxxxx octets after the leader byte.
>
> I think we can do better:
Indeed. But we already have valid_utf8_string(), I'll see if we
can reuse more code between valid_utf8_string() and printable().
Wietse
> - Limit the leader byte to 244 (which still admits some high code
> points up to U+13FFFF).
>
> - When the leader byte is followed by too few non-leader bytes,
> resynchronise by replacing the leader byte with '?' and starting
> at the next byte. This produces more sensible results.
>
> - When too many non-leader bytes follow, they're also replaced with
> '?'.
>
> I put together a patch, and small number of test code points, which
> include some malformed UTF-8, so not easy to post. In git, I tagged the
> input file as "binary" for "git commit" purposes.
>
> I doubt "patch" supports the binary file, so I also attached
> "printable.in" separately.
>
> --
> Viktor.
[ Attachment, skipping... ]
[ Attachment, skipping... ]
> _______________________________________________
> Postfix-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
Postfix-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]