Bugs item #1054564, was opened at 2004-10-26 14:55 Message generated for change (Comment added) made by effbot You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1054564&group_id=5470
Category: Regular Expressions Group: Python 2.2.3 >Status: Closed >Resolution: Out of Date >Priority: 9 Submitted By: Rob (rwhent) Assigned to: Fredrik Lundh (effbot) Summary: RE '*.?' cores if len of found string exceeds 10000 Initial Comment: Whilst parsing some extremely long strings I found that the re.match causes segmentation faults on Solaris 2.8 when strings being matched contain '*.?' and the contents of the regex which matches this part of the regex exceeds 10000 chars (actually it seemed to be exactly at 8192 chars) This is the regex used: if re.match('^.*?\[\s*[A-Za-z_0-9]+\s*\].*',string): This regex looks for '[alphaNum_]' present in a large string When it failed the string was 8192 chars long with no matching '[alphaNum_]' present. If I reduce the length of the string below 8192 it works ok. This is a major issue to my application as some string to be parsed are very large. I saw some discussion on another bulletin board with a similar issue ---------------------------------------------------------------------- >Comment By: Fredrik Lundh (effbot) Date: 2005-02-14 12:35 Message: Logged In: YES user_id=38376 closing, due to lack of feedback. suggested workarounds should solve the problem. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2004-10-30 17:44 Message: Logged In: YES user_id=341410 In the case of this particular search, you could write your own little searcher. The following could likely be done better, but this is a quick 5-minute job that won't core on you unless something is really wrong with Python, and may be a reasonable stopgap until someone re-does the regular expression library. import string def find_thing(s): sp = 0 d = dict.fromkeys(list(string.letters+string.digits+'_')) while sp < len(s): start = None for i in xrange(sp, len(s)): if s[i] == '[': start = i break if start is None: return for i in xrange(start+1, len(s)): if s[i] in d: continue elif s[i] == ']': return s[start:i+1] else: sp = i break It returns None on failure to find, and the string otherwise. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2004-10-26 15:24 Message: Logged In: YES user_id=38376 btw, if you're searching for things, why not use the "search" method? if re.search('\[\s*[A-Za-z_0-9]+\s*\]', string): (also, "[A-Za-z_0-9]" is better spelled "\w") ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2004-10-26 15:20 Message: Logged In: YES user_id=38376 The max recursion limit problem in the re module is well-known. Until this limitation in the implementation is removed, to work around it check http://www.python.org/dev/doc/devel/lib/module-re.html http://python/org/sf/493252 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1054564&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com