[issue16586] json library can't parse large (> 2^31) strings
New submission from Dustin Boswell: Here's a command-line that parses a json string containing a large array of short strings: python -c "import simplejson as json; json.loads('[' + '''\"asdfadf\", ''' * 1 + '\"asdfasf\"]') " That works, but if you increase the size a little bit (so the string is > 2^31) python -c "import simplejson as json; json.loads('[' + '''\"asdfadf\", ''' * 3 + '\"asdfasf\"]') " Traceback (most recent call last): File "", line 1, in File "/usr/lib/pymodules/python2.6/simplejson/__init__.py", line 307, in loads return _default_decoder.decode(s) File "/usr/lib/pymodules/python2.6/simplejson/decoder.py", line 338, in decode raise ValueError(errmsg("Extra data", s, end, len(s))) ValueError: Extra data: line 1 column -994967285 - line 1 column 330011 (char -994967285 - 330011) Here's my version: $ python Python 2.6.5 (r265:79063, Oct 1 2012, 22:04:36) [GCC 4.4.3] on linux2 >>> import sys;print("%x" % sys.maxsize, sys.maxsize > 2**32) ('7fff', True) Also note that the test above requires at least 20GB of memory (that's not a bug, just a heads-up). -- components: Library (Lib) messages: 176722 nosy: Dustin.Boswell priority: normal severity: normal status: open title: json library can't parse large (> 2^31) strings type: crash versions: Python 2.6, Python 2.7 ___ Python tracker <http://bugs.python.org/issue16586> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16586] json library can't parse large (> 2^31) strings
Dustin Boswell added the comment: Here's a slightly smaller/cleaner test case that only requires 12GB of ram to run: python -c "import simplejson as json; json.loads('[' + '''\"...\", ''' * 2 + '0]') " Traceback (most recent call last): File "", line 1, in File "/usr/lib/pymodules/python2.6/simplejson/__init__.py", line 307, in loads -- ___ Python tracker <http://bugs.python.org/issue16586> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16586] json library can't parse large (> 2^31) strings
Dustin Boswell added the comment: I thought simplejson was a standard module for 2.6, and got renamed to json (replacing the older json module) in later versions. For instance, I get the same problem with 2.7 (no simplejson): python2.7 -c "import json; json.loads('[' + '''\"...\", ''' * 2 + '0]') " ^AcTraceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/json/__init__.py", line 326, in loads return _default_decoder.decode(s) File "/usr/local/lib/python2.7/json/decoder.py", line 369, in decode raise ValueError(errmsg("Extra data", s, end, len(s))) ValueError: Extra data: line 1 column -2094967293 - line 1 column 220003 (char -2094967293 - 220003) And if I use the "json" module in 2.6 (which is 10x slower, takes over 30 minutes to run) it also fails, but with a difference trace: python2.6 -c "import json; json.loads('[' + '''\"...\", ''' * 2 + '0]') " Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.6/json/__init__.py", line 307, in loads return _default_decoder.decode(s) File "/usr/lib/python2.6/json/decoder.py", line 319, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python2.6/json/decoder.py", line 336, in raw_decode obj, end = self._scanner.iterscan(s, **kw).next() File "/usr/lib/python2.6/json/scanner.py", line 55, in iterscan rval, next_pos = action(m, context) File "/usr/lib/python2.6/json/decoder.py", line 217, in JSONArray value, end = iterscan(s, idx=end, context=context).next() File "/usr/lib/python2.6/json/scanner.py", line 55, in iterscan rval, next_pos = action(m, context) File "/usr/lib/python2.6/json/decoder.py", line 155, in JSONString return scanstring(match.string, match.end(), encoding, strict) ValueError: end is out of bounds -- ___ Python tracker <http://bugs.python.org/issue16586> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16586] json library can't parse large (> 2^31) strings
Dustin Boswell added the comment: Python 2.7.3 (default, Aug 3 2012, 20:01:21) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys;print("%x" % sys.maxsize, sys.maxsize > 2**32) ('7fff', True) -- ___ Python tracker <http://bugs.python.org/issue16586> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16586] json library can't parse large (> 2^31) strings
Dustin Boswell added the comment: Yes, bug exists on 3.1 (gcc build), as well as darwin build of 2.7: python3.1 -c "import json; json.loads('[%22s' % ']')" Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3.1/json/__init__.py", line 293, in loads return _default_decoder.decode(s) File "/usr/lib/python3.1/json/decoder.py", line 328, in decode raise ValueError(errmsg("Extra data", s, end, len(s))) ValueError: Extra data: line 1 column -2094967295 - line 1 column 220001 (char -2094967295 - 220001) python3.1 Python 3.1.2 (r312:79147, Oct 23 2012, 20:07:42) [GCC 4.4.3] on linux2 >>> import sys;print("%x" % sys.maxsize, sys.maxsize > 2**32) 7fff True python2.7 -c "import json; json.loads('[%22s' % ']')" Traceback (most recent call last): File "", line 1, in File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 326, in loads File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode ValueError: Extra data: line 1 column -2094967295 - line 1 column 220001 (char -2094967295 - 220001) python2.7 Python 2.7.2 (default, Jun 20 2012, 16:23:33) [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin >>> import sys;print("%x" % sys.maxsize, sys.maxsize > 2**32) ('7fff', True) -- versions: +Python 3.1 ___ Python tracker <http://bugs.python.org/issue16586> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com