On 3 April 2012 09:46, Rasmus Lerdorf <ras...@lerdorf.com> wrote:
> On 04/02/2012 06:35 PM, Charlie Somerville wrote:
>> I've created a pull request (https://github.com/php/php-src/pull/33) that 
>> changes json_encode to fall back to ASCII for strings that are not valid 
>> UTF-8.
>>
>> I ran into an issue in a production application involving PayPal IPN 
>> callbacks (which are sent encoded as windows-1252) and json_encode(). If 
>> there was an accented character present in the data, json_encode() would 
>> fail to encode the string and serialize it as 'null'.
>>
>> I've modified the behaviour of the underlying json_escape_string() 
>> implementation to attempt to encode strings anyway while still producing a 
>> warning.
>
> JSON with non-Unicode strings is no longer JSON. The spec is explicit
> that all strings must be Unicode. The default encoding is UTF-8, but it
> could be UTF-16/32 as well.
>
> See http://www.ietf.org/rfc/rfc4627.txt

Agreed. I have a patch lying around for bug #61537 that I need to
finish up that actually goes the other way and changes the default
behaviour of json_encode() to match the documentation and return false
if a string is invalid UTF-8, rather than just nulling that string.

-1 from me on this. It's a regression from the current behaviour, IMO.

Adam

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to