New submission from Guido Ostkamp :
Hello, a medium size regular expression crashes Python 2.5.1 as follows: Traceback (most recent call last): File "./regtest.py", line 14, in <module> m = rematch(pats) File "./regtest.py", line 12, in rematch return re.compile(pat).match File "/export/home/ostkamp/local/lib/python2.5/re.py", line 180, in compile return _compile(pattern, flags) File "/export/home/ostkamp/local/lib/python2.5/re.py", line 231, in _compile p = sre_compile.compile(pattern, flags) File "/export/home/ostkamp/local/lib/python2.5/sre_compile.py", line 530, in compile groupindex, indexgroup OverflowError: regular expression code size limit exceeded This is apparently caused by some code in Modules/_sre.c and Modules/sre.h as follows: self->code[i] = (SRE_CODE) value; if ((unsigned long) self->code[i] != value) { PyErr_SetString(PyExc_OverflowError, "regular expression code size limit exceeded"); break; } An 'unsigned int' value is unnecessarily squeezed into an 'unsigned short' field defined in sre.h: #ifdef Py_UNICODE_WIDE #define SRE_CODE Py_UCS4 #else #define SRE_CODE unsigned short #endif On all systems I'm working on (SuSE Linux SLES 9, Solaris 8 etc.) the else case of the ifdef applies which chooses 'unsigned short'. I don't understand the relationship between 'unicode' and what is apparently the size of the regular expression stack here. Some experiments have shown that changing the 'unsigned short' to 'unsigned long' and rebuilding Python fixes the problem. Here is a test program to reproduce the error: #!/usr/bin/env python import re, random, sys def randhexstring(): return "".join(["%04x" % random.randint(0, 0xffff) for x in range(20)]) pats = [randhexstring() for x in range(1000)] def rematch(pats): pat = '(?:%s)' % '|'.join(pats) return re.compile(pat).match m = rematch(pats) count = 0 for s in pats * 100: if m(s): count += 1 print count Regards Guido ---------- components: Regular Expressions messages: 55885 nosy: ostkamp severity: normal status: open title: Medium size regexp crashes python type: crash versions: Python 2.5 __________________________________ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1160> __________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com