Serhiy Storchaka added the comment: After investigating the problem deeper, I see that new parameter is not needed. RFC 4627 does not make exceptions for the range 0xD800-0xDFFF, and the decoder must accept lone surrogates, both escaped and unescaped. Non-BMP characters may be represented as escaped surrogate pair, so escaped surrogate pair may be decoded as non-BMP character, while unescaped surrogate pair shouldn't.
Here is a patch, with which JSON decoder accepts encoded lone surrogates. Also fixed a bug when Python implementation decodes "\\ud834\\u0079x" as "\U0001d179". ---------- keywords: +patch stage: needs patch -> patch review title: Add a string error handler to JSON encoder/decoder -> JSON should accept lone surrogates type: enhancement -> behavior versions: +Python 2.7, Python 3.3 Added file: http://bugs.python.org/file30130/json_decode_lone_surrogates.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue17906> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com