Bugs item #1548288, was opened at 2006-08-28 23:32 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1548288&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Andres Riancho (andresriancho) Assigned to: Nobody/Anonymous (nobody) Summary: sgmllib.sgmlparser is not thread safe Initial Comment: Python version: =============== [EMAIL PROTECTED]:~$ python Python 2.4.3 (#2, Apr 27 2006, 14:43:58) [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2 Problem description: ==================== sgmlparser is not thread safe, i discovered this problem when trying to fetch and parse many html files at the same time. An example of this bug can be found attached. The sgmlparser input html is this string: '<html></html>'*100 , this was written this way to simplify the code, please note that if you replace this string with a "large" html document, it will also fail. solution: ========= make the lib thread safe, or add some lines to the documentation saying that it aint thread safe. Traceback: ========== python sgml-not-threadSafe.py Started all threads Successfully parsed html Exception in thread Thread-3: Traceback (most recent call last): File "/usr/lib/python2.4/threading.py", line 442, in __bootstrap self.run() File "/usr/lib/python2.4/threading.py", line 422, in run self.__target(*self.__args, **self.__kwargs) File "sgml-not-threadSafe.py", line 10, in parseHtml self._parser.feed( html ) File "/usr/lib/python2.4/sgmllib.py", line 95, in feed self.goahead(0) File "/usr/lib/python2.4/sgmllib.py", line 129, in goahead k = self.parse_starttag(i) File "/usr/lib/python2.4/sgmllib.py", line 262, in parse_starttag self.error('unexpected call to parse_starttag') File "/usr/lib/python2.4/sgmllib.py", line 102, in error raise SGMLParseError(message) SGMLParseError: unexpected call to parse_starttag Successfully parsed html Successfully parsed html Additional note =============== To recreate this bug, you should run the sample code more than one time. Thread handling aint always the same, the issue is there but sometimes it fails to appear on the first (second, third...) run. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1548288&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com