[Richard Meraz] > We need to capture more than 99 named groups using python regular > expressions. > ... > its clear why the language designers have decided on this limitation. For > our system, however, it is essential that we be able to capture an arbitrary > number of groups. > > Could anyone on the list suggest what parts of the library code make > assumptions about this restriction? We'd like to make some local changes to > the core library to allow us to continue the development of our system (we > don't want to switch to another language). We removed the condition in > sre_compile.py that raises an exception for compiled regexps with more than > 100 groups. This allowed us to compile a regular expression with more than > 100 groups, but subsequent attempts to match or search with that regular > expression resulted in segfaults.
Which is a good clue that you'll have to understand the C code implementing regexps. That's in Modules/_sre.c. In the absence of understanding, your best bet is to get in a debugger, see where it's segfaulting, guess at the cause, try to fix it, and start over. For a start, you'll certainly need to boost the value of this #define in sre.h: #define SRE_MARK_SIZE 200 Sorry, but I have no idea whether you'll need more than just that. -- http://mail.python.org/mailman/listinfo/python-list