[issue18753] [c]ElementTree.fromstring fails to parse ]]>

Kees Bos Fri, 16 Aug 2013 09:49:04 -0700

Kees Bos added the comment:

I'm not an expert, but from: http://www.w3.org/TR/REC-xml/#NT-AttValue


        AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* 
"'"

which I read as: Any Reference character is valid, except & and <, which are 
used for escaping and closing the element.

The sequence <value>]]></value> also valdates as well-formed at 
http://www.xmlvalidation.com/

The sequence <value>]></value> parses OK (So, it's only with a double ] and > )

It's probably related to parsing <![CDATA[ ... ]]> (i.e. I guess when the 
parser detects ]]> it 
assumes / requires the state of <![CDATA[ which is, of course, not true)

The sequence <value><![CDATA[foo]]></value> is parsed correctly:
>>> ET.fromstring('<value><![CDATA[foo]]></value>').text
'foo'


BTW, lxml.etree.fromstring fails also and so does 
http://www.w3schools.com/xml/xml_validator.asp

I'll ask around on the lxml mailinglist what they think about this behavior.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18753>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18753] [c]ElementTree.fromstring fails to parse ]]>

Reply via email to