Bugs item #1144533, was opened at 2005-02-19 13:02 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1144533&group_id=5470
Category: Python Library Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Allan Hoeltje (ahoeltje) Assigned to: Nobody/Anonymous (nobody) Summary: htmllib quote parse error within a <script> Initial Comment: I am using the htmllib to parse web pages for plain text content. I came across a web page that contained a script construct similar to the example below. Note that the script is itself writing a script. The htmllib appears to be confused by the use of single and double quotes used within the real <script> and </script> tags. I am using "Python 2.3 (#1, Sep 13 2003, 00:49:11) [GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin" on a PowerBook G4 running OSX 10.3.8. <html> <body> <h1> This is a test </h1> <br> <blockquote> <script language="JavaScript"> rnum = Math.round( Math.random() * 100000 ); document.write( '<scr' + 'ipt src="http://www.a.org/' + rnum + '/"></scr' + 'ipt>' ); </script> </blockquote> </body> </html> Here is the Python trace: Traceback (most recent call last): File "cleanFeed.py", line 26, in ? clean = stripHtml.strip( feed ) File "/Users/allan/Desktop/Mood for Today/stripHtml.py", line 144, in strip parser.feed(s) File "/System/Library/Frameworks/Python.framework/Versions/ 2.3/lib/python2.3/HTMLParser.py", line 108, in feed self.goahead(0) File "/System/Library/Frameworks/Python.framework/Versions/ 2.3/lib/python2.3/HTMLParser.py", line 150, in goahead k = self.parse_endtag(i) File "/System/Library/Frameworks/Python.framework/Versions/ 2.3/lib/python2.3/HTMLParser.py", line 327, in parse_endtag self.error("bad end tag: %s" % `rawdata[i:j]`) File "/System/Library/Frameworks/Python.framework/Versions/ 2.3/lib/python2.3/HTMLParser.py", line 115, in error raise HTMLParseError(message, self.getpos()) HTMLParser.HTMLParseError: bad end tag: "</scr' + 'ipt>", at line 1, column 309 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1144533&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com