Bugs item #1662581, was opened at 2007-02-17 15:39 Message generated for change (Comment added) made by josiahcarlson You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1662581&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Performance Group: None Status: Open Resolution: None Priority: 4 Private: No Submitted By: Gregory P. Smith (greg) Assigned to: Nobody/Anonymous (nobody) Summary: the re module can perform poorly: O(2**n) versus O(n**2) Initial Comment: in short, the re module can degenerate to really really horrid performance. See this for how and why: http://swtch.com/~rsc/regexp/regexp1.html exponential decline instead of squared. I don't have a patch so i'm filing this bug as a starting point for future work. The Modules/_sre.c files implementation could be updated to use the parallel stepping Thompson approach instead of recursive backtracking. filing this as a bug until me or someone else comes up with a patch. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-02-22 00:51 Message: Logged In: YES user_id=341410 Originator: NO I would file this under "feature request"; the current situation isn't so much buggy, as slow. While you can produce a segfault with the current regular expression engine (due to stack overflow), you can do the same thing with regular Python on Linux (with sys.setrecursionlimit), ctypes, etc., and none of those are considered as buggy. My only concern with such a change is that it may or may not change the semantics of the repeat operators '*' and '+', which are currently defined as "greedy". If I skimmed the article correctly late at night, switching to a Thompson family regular expression engine may result in those operators no longer being greedy. Please correct me if I am wrong. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1662581&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com