Bugs item #1117302, was opened at 2005-02-06 15:04 Message generated for change (Comment added) made by effbot You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1117302&group_id=5470
Category: Python Library Group: Python 2.4 >Status: Closed >Resolution: Wont Fix Priority: 5 Submitted By: Paul Birnie (pbirnie) Assigned to: Nobody/Anonymous (nobody) Summary: sgmllib.SGMLParser Initial Comment: sgmllib.SGMLParser calls start tag and end_methods correctly until it encounters <a title="link1" href="url1">One</a> <br/><a title="link2" href="someurl2">Two</a> <a title="link2" href="url3">Three</a> the <br/> seems to cause its parsing to become confused and I conly get call backs for tag a twice (link 1 and 3) ---------------------------------------------------------------------- >Comment By: Fredrik Lundh (effbot) Date: 2005-02-14 12:17 Message: Logged In: YES user_id=38376 closing, due to lack of feedback. using HTMLParser instead of sgmllib should solve the problem. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2005-02-08 09:14 Message: Logged In: YES user_id=38376 footnote 3: for the link case, also note that the HTMLParser module handles this in a more practical way (that is, it limits itself to SGML features that's actually used on the web). ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2005-02-08 09:03 Message: Logged In: YES user_id=38376 footnote 2: if you need to deal with broken HTML, use TidyLib: http://utidylib.berlios.de/ http://effbot.org/zone/element-tidylib.htm ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2005-02-08 09:01 Message: Logged In: YES user_id=38376 footnote: <br/> is an XML construct, and is not valid HTML. In HTML, "<tag/blah/" is short for "<tag>blah</tag>", so the BR section is parsed as START br DATA ><a title="link2" href="someurl2">Two< END br DATA a> which is 100% correct. For more on this topic, see: http://www.cs.tut.fi/~jkorpela/html/empty.html ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1117302&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com