[issue16586] json library can't parse large (> 2^31) strings

2012-11-30 Thread Dustin Boswell

New submission from Dustin Boswell:

Here's a command-line that parses a json string containing a large array of 
short strings:

python -c "import simplejson as json; json.loads('[' + '''\"asdfadf\", ''' * 
1 + '\"asdfasf\"]') "

That works, but if you increase the size a little bit (so the string is > 2^31)

python -c "import simplejson as json; json.loads('[' + '''\"asdfadf\", ''' * 
3 + '\"asdfasf\"]') "

Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/pymodules/python2.6/simplejson/__init__.py", line 307, in loads
return _default_decoder.decode(s)
  File "/usr/lib/pymodules/python2.6/simplejson/decoder.py", line 338, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column -994967285 - line 1 column 330011 
(char -994967285 - 330011)


Here's my version:

$ python
Python 2.6.5 (r265:79063, Oct  1 2012, 22:04:36) 
[GCC 4.4.3] on linux2
>>> import sys;print("%x" % sys.maxsize, sys.maxsize > 2**32)
('7fff', True)


Also note that the test above requires at least 20GB of memory (that's not a 
bug, just a heads-up).

--
components: Library (Lib)
messages: 176722
nosy: Dustin.Boswell
priority: normal
severity: normal
status: open
title: json library can't parse large (> 2^31) strings
type: crash
versions: Python 2.6, Python 2.7

___
Python tracker 
<http://bugs.python.org/issue16586>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16586] json library can't parse large (> 2^31) strings

2012-11-30 Thread Dustin Boswell

Dustin Boswell added the comment:

Here's a slightly smaller/cleaner test case that only requires 12GB of ram to 
run:

python -c "import simplejson as json; json.loads('[' + '''\"...\", ''' * 
2 + '0]') "

Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/pymodules/python2.6/simplejson/__init__.py", line 307, in loads

--

___
Python tracker 
<http://bugs.python.org/issue16586>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16586] json library can't parse large (> 2^31) strings

2012-11-30 Thread Dustin Boswell

Dustin Boswell added the comment:

I thought simplejson was a standard module for 2.6, and got renamed to json 
(replacing the older json module) in later versions.

For instance, I get the same problem with 2.7 (no simplejson):

python2.7 -c "import json; json.loads('[' + '''\"...\", ''' * 2 + 
'0]') "
^AcTraceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python2.7/json/__init__.py", line 326, in loads
return _default_decoder.decode(s)
  File "/usr/local/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column -2094967293 - line 1 column 220003 
(char -2094967293 - 220003)


And if I use the "json" module in 2.6 (which is 10x slower, takes over 30 
minutes to run) it also fails, but with a difference trace:

python2.6 -c "import json; json.loads('[' + '''\"...\", ''' * 2 + 
'0]') "
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python2.6/json/__init__.py", line 307, in loads
return _default_decoder.decode(s)
  File "/usr/lib/python2.6/json/decoder.py", line 319, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.6/json/decoder.py", line 336, in raw_decode
obj, end = self._scanner.iterscan(s, **kw).next()
  File "/usr/lib/python2.6/json/scanner.py", line 55, in iterscan
rval, next_pos = action(m, context)
  File "/usr/lib/python2.6/json/decoder.py", line 217, in JSONArray
value, end = iterscan(s, idx=end, context=context).next()
  File "/usr/lib/python2.6/json/scanner.py", line 55, in iterscan
rval, next_pos = action(m, context)
  File "/usr/lib/python2.6/json/decoder.py", line 155, in JSONString
return scanstring(match.string, match.end(), encoding, strict)
ValueError: end is out of bounds

--

___
Python tracker 
<http://bugs.python.org/issue16586>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16586] json library can't parse large (> 2^31) strings

2012-12-01 Thread Dustin Boswell

Dustin Boswell added the comment:

Python 2.7.3 (default, Aug  3 2012, 20:01:21) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys;print("%x" % sys.maxsize, sys.maxsize > 2**32)
('7fff', True)

--

___
Python tracker 
<http://bugs.python.org/issue16586>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16586] json library can't parse large (> 2^31) strings

2012-12-01 Thread Dustin Boswell

Dustin Boswell added the comment:

Yes, bug exists on 3.1 (gcc build), as well as darwin build of 2.7:

python3.1 -c "import json; json.loads('[%22s' % ']')"
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python3.1/json/__init__.py", line 293, in loads
return _default_decoder.decode(s)
  File "/usr/lib/python3.1/json/decoder.py", line 328, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column -2094967295 - line 1 column 220001 
(char -2094967295 - 220001)

python3.1
Python 3.1.2 (r312:79147, Oct 23 2012, 20:07:42) 
[GCC 4.4.3] on linux2
>>> import sys;print("%x" % sys.maxsize, sys.maxsize > 2**32)
7fff True




python2.7 -c "import json; json.loads('[%22s' % ']')"
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py",
 line 326, in loads
  File 
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py",
 line 369, in decode
ValueError: Extra data: line 1 column -2094967295 - line 1 column 220001 
(char -2094967295 - 220001)

python2.7
Python 2.7.2 (default, Jun 20 2012, 16:23:33) 
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
>>> import sys;print("%x" % sys.maxsize, sys.maxsize > 2**32)
('7fff', True)

--
versions: +Python 3.1

___
Python tracker 
<http://bugs.python.org/issue16586>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com