Bugs item #1566086, was opened at 2006-09-26 21:23
Message generated for change (Comment added) made by akuchling
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1566086&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Regular Expressions
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Fabien Devaux (fabinator)
Assigned to: Gustavo Niemeyer (niemeyer)
Summary: RE (regular expression) matching stuck in loop

Initial Comment:
See the code:
http://pastebin.ca/183613
the "finditer()" call don't seems to return.
Playing with the re can bypass the problem but it looks
like a bug.
(I'm really sorry if I did something wrong and didn't
notice)

note: I can reproduce it with python2.5

----------------------------------------------------------------------

>Comment By: A.M. Kuchling (akuchling)
Date: 2006-10-26 15:53

Message:
Logged In: YES 
user_id=11375

I haven't dug very far into the code, but suspect this isn't
a bug in the regex code.

The pattern uses lots of .*? subpatterns, and this often
means the pattern takes a long time to fail if it isn't
going to match.  The regex engine matches the <link> group,
and then there's a .*?, followed by <b>.  The engine looks
at every character and if it sees a <b>, tries another .*?.
 This is O(n**2) where n is the number of character in the
string being searched, and that string is 93,000 characters
long.  If you limit the string to 5K or so, the match fails
pretty quickly.

I strongly suggest working with the HTML.  You could run the
HTML through tidy to convert to XHTML and use ElementTree on
the resulting XML.


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2006-10-26 15:38

Message:
Logged In: YES 
user_id=11375

Attaching the test script.  I've modified it to save a copy
of the HTML page's data so that running the example doesn't
require accessing a slow web site repeatedly.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1566086&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to