On 12.08.2024 at 19:15, Nick Lockheart wrote: > One report is: https://www.unicode.org/reports/tr36 > > There's several things in their guide. > > They recommend that illegal byte sequences not be deleted as this can > create an attack vector where two bytes that fit together are split by > an illegal sequence, that, once removed, puts the two bytes back > together to make something new, *after* the program has checked for > dangerous characters: > > https://www.unicode.org/reports/tr36/#SecureEncodingConversion > > > In PHP, you should be able to do that with: > > $ScrubbedBody = mb_scrub($_POST['body'], 'UTF-8');
I suggest to *validate*, not to *sanitize*. If a malicious user submits illegal UTF-8, just reject the request right away. Regular users shouldn't even notice this. Christoph