This should really be fixed similar to the iconv //IGNORE flag - so that bad characters are just replaced with '?'

We use it to render spam email summaries, and dont really care if the encoding is incorrect, just as long as it shows something.

Throwing a warning without having a fix/workaround, just reduces the usefulness of the function.

Regards
Alan

Stanislav Malyshev wrote:
Hi!

Right now, if json_encode sees wrong UTF-8 data, it just cuts the string in the middle, no error returned, no message produced. Example:

var_dump(json_encode("ab\xE0"));
var_dump(json_encode("ab\xE0\""));

Both strings get cut at "ab". I think it's not a good idea to just silently cut the data. In fact, I think it is a bug caused by this code in ext/json/utf8_to_utf16.c:
        if (c < 0) {
            return UTF8_END ? the_index : UTF8_ERROR;
        }
which inherited this bug from code published on json.org. It should be:
        if (c < 0) {
            return (c == UTF8_END) ? the_index : UTF8_ERROR;
        }
Now this is an easy fix but would lead to bad strings silently converted to empty strings. The question is - should we have an error there? If so, which one - E_WARNING, E_NOTICE? I'm for E_WARNING.
Also filed as bug #43941.
Any comments?


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to