Bugs item #1662581, was opened at 2007-02-17 15:39
Message generated for change (Comment added) made by josiahcarlson
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1662581&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Performance
Group: None
Status: Open
Resolution: None
Priority: 4
Private: No
Submitted By: Gregory P. Smith (greg)
Assigned to: Nobody/Anonymous (nobody)
Summary: the re module can perform poorly: O(2**n) versus O(n**2)

Initial Comment:
in short, the re module can degenerate to really really horrid performance.  
See this for how and why:

 http://swtch.com/~rsc/regexp/regexp1.html

exponential decline instead of squared.

I don't have a patch so i'm filing this bug as a starting point for future 
work.  The Modules/_sre.c files implementation could be updated to use the 
parallel stepping Thompson approach instead of recursive backtracking.

filing this as a bug until me or someone else comes up with a patch.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-02-22 00:51

Message:
Logged In: YES 
user_id=341410
Originator: NO

I would file this under "feature request"; the current situation isn't so
much buggy, as slow.  While you can produce a segfault with the current
regular expression engine (due to stack overflow), you can do the same
thing with regular Python on Linux (with sys.setrecursionlimit), ctypes,
etc., and none of those are considered as buggy.

My only concern with such a change is that it may or may not change the
semantics of the repeat operators '*' and '+', which are currently defined
as "greedy".  If I skimmed the article correctly late at night, switching
to a Thompson family regular expression engine may result in those
operators no longer being greedy.  Please correct me if I am wrong.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1662581&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to