On 3 April 2012 09:46, Rasmus Lerdorf <ras...@lerdorf.com> wrote: > On 04/02/2012 06:35 PM, Charlie Somerville wrote: >> I've created a pull request (https://github.com/php/php-src/pull/33) that >> changes json_encode to fall back to ASCII for strings that are not valid >> UTF-8. >> >> I ran into an issue in a production application involving PayPal IPN >> callbacks (which are sent encoded as windows-1252) and json_encode(). If >> there was an accented character present in the data, json_encode() would >> fail to encode the string and serialize it as 'null'. >> >> I've modified the behaviour of the underlying json_escape_string() >> implementation to attempt to encode strings anyway while still producing a >> warning. > > JSON with non-Unicode strings is no longer JSON. The spec is explicit > that all strings must be Unicode. The default encoding is UTF-8, but it > could be UTF-16/32 as well. > > See http://www.ietf.org/rfc/rfc4627.txt
Agreed. I have a patch lying around for bug #61537 that I need to finish up that actually goes the other way and changes the default behaviour of json_encode() to match the documentation and return false if a string is invalid UTF-8, rather than just nulling that string. -1 from me on this. It's a regression from the current behaviour, IMO. Adam -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php