Bugs item #1515829, was opened at 2006-07-02 08:26 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1515829&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Regular Expressions Group: Python 2.5 Status: Open Resolution: None Priority: 5 Submitted By: Erik Demaine (edemaine) Assigned to: Gustavo Niemeyer (niemeyer) Summary: Exponential behavior in regular expression Initial Comment: 're' seems to have serious performance trouble with nested Kleene stars in the regular expression, if the matched text is fairly long. Attached is an example, naturally arising in some LaTeX parsing [admittedly not the only way to do it], along with a text generator parameterized by a repetition count n. Here is simple timing data on a Pentium 4 1.5GHz with 1.5GB RAM as a function of n: [...] n=4: 0:00:00.015000 n=5: 0:00:00.032000 n=6: 0:00:00.140000 n=7: 0:00:00.594000 n=8: 0:00:02.203000 n=9: 0:00:08.859000 n=10: 0:00:39.641000 n=11: 0:02:44.172000 n=12: 0:10:23.500000 This seems far slower than it should be, but I don't know the algorithm used by 're'. Is this behavior expected? If so, should it be optimized away by changing the algorithm? The generated text consists of a few lines of preamble, then a variable number n of copies of a partiuclar line, followed by a few lines of postscript. The first line of the postscript causes the regular expression *not* to match, and 're' spends a long time to find that out. Removing that line from the postscript, and causing the regular expression to match, makes the program run instantly. I get the same behavior on Python 2.4 and 2.5b1, on Windows and Linux, and with re.sub and re.search. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1515829&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com