[issue7114] HTMLParser doesn't handle

2010-03-11 Thread Fredrik Lundh
Fredrik Lundh added the comment: And to clarify, XHTML is an reformulation of HTML4 using XML syntax, so you should use an XML parser to parse it, not an HTML parser. The formats are related, but not identical. -- ___ Python tracker

[issue7114] HTMLParser doesn't handle

2009-12-08 Thread Denis
Denis added the comment: The CDATA sections are part of XML specification. http://www.w3.org/TR/REC-xml/#sec-cdata-sect HTML is not XML, so HTMLParser does the right thing here. -- nosy: +Denis ___ Python tracker

[issue7114] HTMLParser doesn't handle

2009-10-12 Thread Greg Baker
New submission from Greg Baker : I believe what I'm seeing here is somewhat related to issue 670664, but is easier to handle because of the CDATA structure. Basically, HTMLParser doesn't recognize CDATA sections at all, so their content is incorrectly parsed like normal data. The following is a