Re: [PHP-DEV] Inconsistency mbstring functions

Stefan Schiller via internals Tue, 05 Dec 2023 00:43:52 -0800

On Mon, Dec 4, 2023 at 8:45 PM Alex <alexinbeij...@gmail.com> wrote:
>
> Stefan,
>
>>
>> My biggest concern is that this quirk can cause security issues in
>> user code. I came across this in the first place when discovering an
>> exploitable security vulnerability in an application. From my point of
>> view, this is not only about inconsistent behavior but also violates
>> the documentation for specific functions like mb_strstr. I agree that
>> a lot of mbstring operations should not be used on invalid strings,
>> and an exception seems to be an appropriate answer despite the huge BC
>> impact.
>
>
> 1) I'm sure the vulnerable application is proprietary, but can you at least 
> tell us which mbstring functions were directly involved in this 
> vulnerability, give an overview of the mechanism by which the exploit worked 
> (scrubbed of any specific details about the application), and let us know the 
> general nature of the resulting compromise? (i.e. denial-of-service, 
> information disclosure, allowing users to impersonate other users, etc)
>
> Real-life examples of actual security impact provide far more compelling 
> grounds for a BC break than hypothetical security impact.


The case I am referring to involves mb_strpos, which is used to
determine the index of a specific character, and mb_substr, which is
used to extract the substring until the first occurrence of this
character. For this specific case, this results in a Cross-Site
Scripting (XSS) vulnerability, which would allow an attacker to
perform actions on behalf of an authenticated user. The actual code is
more complex, but the following lines should be a suitable
representation of the issue:

$data = $_GET['data'];
$pos = mb_strpos($data, "<");
$out = $pos ? mb_substr($data, 0, $pos) : $data;
echo $out;

The developer's assumption here is that $out should never contain an
opening HTML tag, and it is thus safe to echo it back in the response.
Despite the fact that e.g. htmlspecialchars should be used to encode
the output, this assumption seems reasonable to me. However, due to
the inconsistent behavior, an attacker can break this assumption. If
$data, e.g., contains the sequence "\xf0AAA<b>", str_pos will consider
the index of "<" to be 4. However, mb_substr assumes the full sequence
to have a length of 4 (1 x 4-byte Unicode character + 3 x 1-byte
Unicode characters). Accordingly, the full sequence, including the
"<b>" tag, is reflected in the response.

>
> 2) Your point about documentation is appreciated. However, you should 
> consider the context. We are dealing with a 20+-year old library, with a 
> large installed base, which has (historically) always had inconsistent, 
> poorly defined semantics in edge cases, and which (historically) has always 
> had poor documentation. Don't forget Hyrum's Law, Stefan. Never forget 
> Hyrum's Law.
>
> In view of all this, we are usually much more likely to amend the mbstring 
> documentation to match behavior than to modify behavior to match 
> documentation. Now, that's not to say we can never modify mbstring behavior 
> to match documentation (I have done it myself several times), but there 
> should always be good reasons to prefer the new behavior (not just "because 
> that's what the documentation says").

I can fully understand your concerns, especially taking into account
the long-standing history of the library and Hyrum's Law. While the
above-mentioned example might even be arguable since two different
functions are involved, the behavior of, e.g., mb_strstr just feels
wrong to me:

$data = "\xf0start";
var_dump(mb_strstr($data, "start")); // string(2) "rt"

>
> 3) Modifying mbstring behavior to make the handling of invalid strings more 
> consistent is a smaller BC break than raising an exception. However, 
> completely refusing to operate on invalid strings (by raising an exception) 
> will be more effective in guarding against potential security vulnerabilities.

Considering BC, completely refusing to operate on invalid strings
seems like a huge break. From a
preventing-security-issues-in-user-code point of view, either of these
solutions seems appropriate.

>
> 4) Hamada-san is absolutely correct that all mbstring users should check 
> strings derived from user input using mb_check_encoding() before operating on 
> them. Unfortunately, "should" doesn't count for much. Most users will not do 
> what they "should".
>
> 5) Amending the documentation to call out these dangers more clearly would be 
> a positive step, though probably not enough.
>

In general, I like the idea of warning against pitfalls like this in
the documentation, although I estimate the effect, especially on
existing applications to be low.

Best Regards,
Stefan Schiller

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Inconsistency mbstring functions

Reply via email to