New submission from Winfried Plappert <winfried.plapp...@gmail.com>:
When parsing HTML and having a string along the lines of <td></td>, a call to handle_data is not issued between handle_starttag and handle_endtag, but afterwards. The problem is in HTMLparser.goahead, where the position i and j are calculated. The code reads if i < j: self.handle_data(rawdata[i:j]) but it should be if i <= j: self.handle_data(rawdata[i:j]) If there is data between <td> and </td>, everything works fine. I just checked the trunk of 2.6, this occurs in line 142 of Lib/HTMLParser.py. The size of HTMLParser.py is 13407 bytes, and is dated 'Feb 26 19:25'. ---------- components: Library (Lib) messages: 102392 nosy: wplappert severity: normal status: open title: HTMLparser does not handle call to handle_data when a tag contains nor data. type: behavior versions: Python 2.6 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8319> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com