Bugs item #1541697, was opened at 2006-08-16 18:51 Message generated for change (Comment added) made by nnorwitz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1541697&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.5 >Status: Closed >Resolution: Fixed >Priority: 5 Submitted By: John J Lee (jjlee) >Assigned to: Neal Norwitz (nnorwitz) Summary: Recently introduced sgmllib regexp bug hangs Python Initial Comment: Looks like revision 47154 introduced a regexp that hangs Python (Ctrl-C won't kill the process, CPU usage sits near 100%) under some circumstances. A test case is attached (sgmllib.html and hang_sgmllib.py). The problem isn't seen if you read the whole file (or nearly the whole file) at once. But that doesn't make it a non-bug, AFAICS. I'm not sure what the problem is, but presumably the relevant part of the patch is this: +starttag = re.compile(r'<[a-zA-Z][-_.:a-zA-Z0-9]*\s*(' + r'\s*([a-zA-Z_][-:.a-zA-Z_0-9]*)(\s*=\s*' + r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\)[EMAIL PROTECTED]' + r'[][\-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~\'"@]*(?=[\s>/<])))?' + r')*\s*/?\s*(?=[<>])') The patch attached to bug 1515142 (also from Sam Ruby -- claims to fix a regression introduced by his recent sgmllib patches, and has not yet been applied) does NOT fix the problem. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2006-09-10 21:25 Message: Logged In: YES user_id=33168 I reverted the patch and added the test case for sgml so the infinite loop doesn't recur. Committed revision 51854. (head) Committed revision 51850. (2.5) Committed revision 51853. (2.4) I will add the hang_re test cause to test_crashers or somewhere. ---------------------------------------------------------------------- Comment By: kovan (kovan) Date: 2006-09-05 14:40 Message: Logged In: YES user_id=1426755 Sorry, correct URL is http://svn.python.org/view/python/trunk/Lib/sgmllib.py?rev=47154&r1=47080&r2=47154 ---------------------------------------------------------------------- Comment By: kovan (kovan) Date: 2006-09-05 14:24 Message: Logged In: YES user_id=1426755 Again FYI, here's the diff where presumably the bug was introduced: http://svn.python.org/view/python/trunk/Lib/sgmllib.py?rev=47080&r1=46996&r2=47080 ---------------------------------------------------------------------- Comment By: kovan (kovan) Date: 2006-09-05 14:04 Message: Logged In: YES user_id=1426755 I've been testing quiver's test case: - With Eclipse's QuickREx plugin: it hangs. It was configured in PCRE mode (which uses Jakarta-ORO Perl 5 regular expressions implementation), and no additional options. - With grep: grep exits with a fatal error and dumps a stack trace. grep was run also in Perl mode, with the command: grep -P -f regexp.txt test.txt I can't find an explanation for this, but I don't know much about regexps. I hope it has some utility for the resolution of this bug nevertheless. ---------------------------------------------------------------------- Comment By: George Yoshida (quiver) Date: 2006-08-17 21:55 Message: Logged In: YES user_id=671362 Slimmed down test case is attached.(The regex pattern in question is used) FYI, r47154 is backported to 2.4 branch(r47155). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1541697&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com