Bugs item #1541697, was opened at 2006-08-16 18:51
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1541697&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
>Status: Closed
>Resolution: Fixed
>Priority: 5
Submitted By: John J Lee (jjlee)
>Assigned to: Neal Norwitz (nnorwitz)
Summary: Recently introduced sgmllib regexp bug hangs Python

Initial Comment:
Looks like revision 47154 introduced a regexp that
hangs Python (Ctrl-C won't kill the process, CPU usage
sits near 100%) under some circumstances.  A test case
is attached (sgmllib.html and hang_sgmllib.py).

The problem isn't seen if you read the whole file (or
nearly the whole file) at once.  But that doesn't make
it a non-bug, AFAICS.

I'm not sure what the problem is, but presumably the
relevant part of the patch is this:

+starttag = re.compile(r'<[a-zA-Z][-_.:a-zA-Z0-9]*\s*('
+        r'\s*([a-zA-Z_][-:.a-zA-Z_0-9]*)(\s*=\s*'
+       
r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\)[EMAIL PROTECTED]'
+       
r'[][\-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~\'"@]*(?=[\s>/<])))?'
+    r')*\s*/?\s*(?=[<>])')


The patch attached to bug 1515142 (also from Sam Ruby
-- claims to fix a regression introduced by his recent
sgmllib patches, and has not yet been applied) does NOT
fix the problem.


----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2006-09-10 21:25

Message:
Logged In: YES 
user_id=33168

I reverted the patch and added the test case for sgml so the
infinite loop doesn't recur.

Committed revision 51854. (head)
Committed revision 51850. (2.5)
Committed revision 51853. (2.4)

I will add the hang_re test cause to test_crashers or somewhere.

----------------------------------------------------------------------

Comment By: kovan (kovan)
Date: 2006-09-05 14:40

Message:
Logged In: YES 
user_id=1426755

Sorry, correct URL is
http://svn.python.org/view/python/trunk/Lib/sgmllib.py?rev=47154&r1=47080&r2=47154



----------------------------------------------------------------------

Comment By: kovan (kovan)
Date: 2006-09-05 14:24

Message:
Logged In: YES 
user_id=1426755

Again FYI, here's the diff where presumably the bug was
introduced:
http://svn.python.org/view/python/trunk/Lib/sgmllib.py?rev=47080&r1=46996&r2=47080

----------------------------------------------------------------------

Comment By: kovan (kovan)
Date: 2006-09-05 14:04

Message:
Logged In: YES 
user_id=1426755

I've been testing quiver's test case:

- With Eclipse's QuickREx plugin: it hangs. It was
configured in PCRE mode (which uses Jakarta-ORO Perl 5
regular expressions implementation), and no additional options.

- With grep: grep exits with a fatal error and dumps a stack
trace. grep was run also in Perl mode, with the command:
grep -P -f regexp.txt test.txt

I can't find an explanation for this, but I don't know much
about regexps. I hope it has some utility for the resolution
of this bug nevertheless.

----------------------------------------------------------------------

Comment By: George Yoshida (quiver)
Date: 2006-08-17 21:55

Message:
Logged In: YES 
user_id=671362

Slimmed down test case is attached.(The regex pattern in
question is used)

FYI, r47154 is backported to 2.4 branch(r47155).


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1541697&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to