Am 21.08.2008 um 18:08 schrieb Rasmus Lerdorf:

David Zülke wrote:
Am 21.08.2008 um 03:34 schrieb William A. Rowe, Jr.:

Stanislav Malyshev wrote:
Hi!
Are there any objections to incorporating bugfix for #43941 (fix for
how json handles invalid UTF-8 sequences) into 5.2? I had some
requests about it, right now it's only in 5.3+.

Is there the alternative of substituting an unmappable character FFFD in place of the invalid sequence? This a a reasonable alternative behavior
for some less stringent cases.

(Yes, the fix is better than the status quo, but just taking this a step
further).

I agree, that would be quite reasonable and also more consistent with
how UTF-8 works in other apps (browsers etc).

Well, using browsers as the benchmark here is a bad idea. IE is absolutely braindead about dealing with illegal UTF-8 chars. It will accept just about any sequence of bytes as a valid UTF-8 char which causes all sorts of problems.

I was talking about the common representation of an invalid sequence. That's the question mark sign you usually see in a browser when the encoding is incorrect.

According to the Unicode standard, U+FFFD is supposed to be used as the replacement character instead of simply stripping invalid ones:

Replacement Character. A character used as a substitute for an uninterpretable character from another encoding. The Unicode Standard uses U+FFFD replacement character for this function.
says http://unicode.org/glossary/#replacement_character

Rendering software which cannot process a Unicode character appropriately most often display it as only an open rectangle, or the Unicode “replacement character” (U+FFFD, �), to indicate the position of the unrecognized character.

says http://en.wikipedia.org/wiki/Unicode#Standardized_subsets

Also see http://www.fileformat.info/info/unicode/char/fffd/index.htm

As always, I consider sticking to specs good practice, so doing it in the above case would be wise :)

Hope that helps,

David
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to