On 12.08.2024 at 19:15, Nick Lockheart wrote:

> One report is: https://www.unicode.org/reports/tr36
>
> There's several things in their guide.
>
> They recommend that illegal byte sequences not be deleted as this can
> create an attack vector where two bytes that fit together are split by
> an illegal sequence, that, once removed, puts the two bytes back
> together to make something new, *after* the program has checked for
> dangerous characters:
>
> https://www.unicode.org/reports/tr36/#SecureEncodingConversion
>
>
> In PHP, you should be able to do that with:
>
> $ScrubbedBody = mb_scrub($_POST['body'], 'UTF-8');

I suggest to *validate*, not to *sanitize*.  If a malicious user submits
illegal UTF-8, just reject the request right away.  Regular users
shouldn't even notice this.

Christoph

Reply via email to