Hi

Unfortunately, I found another way to produce invalid XML values.

template1=# SELECT (XPATH('/*', XMLELEMENT(NAME "root", XMLATTRIBUTES('<' as 
xmlns))))[1];
       xpath       
-------------------
 <root xmlns="<"/>

Since a literal "<" is not allowed in XML attributes, this XML value is not 
well-formed. And indeed

template1=# SELECT (XPATH('/*', XMLELEMENT(NAME "root", XMLATTRIBUTES('<' as 
xmlns))))[1]::TEXT::XML;
ERROR:  invalid XML content
DETAIL:  Entity: line 1: parser error : Unescaped '<' not allowed in attributes 
values

Note that this only affects namespace declarations (xmlns). The following case 
works correctly

template1=# SELECT (XPATH('/*', XMLELEMENT(NAME "root", XMLATTRIBUTES('<' as 
value))))[1];           
        xpath         
----------------------
 <root value="&lt;"/>

The root of this issue is that "<" isn't a valid namespace URI to begin with, 
since "<" isn't in the set of allowed characters for URIs. Thus, when 
converting an XML node back to text, libxml doesn't escape xmlns attribute 
values.

I don't have a good solution for this issue yet. Special-casing attributes 
called "xmlns" (or "xmlns:<prefix>") in XMLATTRIBUTES() solves only part of the 
problem - the TEXT to XML cast is similarly lenient and doesn't complain if you 
do '<root xmlns="&lt;"/>'::XML.

Why this cast succeeds is somewhat beyond me though - piping the very same XML 
document into xmllint produces

$ echo '<root xmlns="&lt;"/>' | xmllint -
-:1: namespace error : xmlns: '<' is not a valid URI

My nagging suspicion is that libxml reports errors like there via some callback 
function, and only returns a non-zero result if there are structural errors in 
the XML. But my experience with libxml is pretty limited, so maybe someone with 
more experience in this area can shed some light on this...

best regards,
Florian Pflug
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to