Hi While reviewing the (now applied) XPATH escaping patches, Radoslaw found one case where the previous failure of XPATH to escape its return value was offset by XMLATTRIBUTES insistence to escape all input values, even if they're already of type XML.
To wit, if you do SELECT XMLELEMENT(NAME "t", XMLATTRIBUTES('&'::XML AS "a")) you get xmlelement -------------------- <t a="&amp;"/> which is somewhat surprising. Especially since SELECT XMLELEMENT(NAME "t", '&'::XML); gives xmlelement -------------- <t>&</t> as one would except. Now, it seems rather unlikely that a real application would actually contain a query similar to the former one - you usually don 't store XML in the attributes of XML nodes. But since XPATH() returns XML, it's not unlikely that an application would do SELECT XMLELEMENT(NAME ..., XMLATTRIBUTES((XPATH(...))[1] AS ...)) As it stands, on 9.2 the values returned by the XPath expression will be escaped twice - once by XPATH and a second time by XMLATTRIBUTES, while on 9.1 the value will be escaped only once. Since this seems to be the most likely situation in which XMLATTRIBUTES would receive an input argument of type XML, and since it's not entirely logical for XMLATTRIBUTES to unconditionally escapes values already of type XML, I propose to change the behaviour of XMLATTRIBUTES as follows. Values not of type XML are be treated as they always have been, i.e. all special characters (<,>,&,",') are replaced by an entity reference (<, >, &, ", '). Values of type XML are assumed to be already escaped, so '&' isn't treated as a special character to avoid double escaping. The other special characters (<,>,",') are escaped as usual. The safety of this relies on the fact that '&' may never appear in a well-formed XML document, except as part of an entity reference. This seems to be the case, except if CDATA sections are involved, which may contain plain ampersands. So the actual escaping rule would have to be a bit more complex than I made it sound above - we'd need to detect CDATA sections, and re-enabled the escaping of '&' until the end of such a section. To actually implement this, I'd remove the use of libxml from the implementation of XMLELEMENT, and instead use our already existing escape_xml() function, enhanced with the ability to handle the partial escaping algorithm outlined above. We already only use libxml to escape attribute values, so doing that isn't a radical departure from xml.c's ways. As an added benefit, all the encoding-related problems of XMLELEMENT and XMLATTRIBUTES would go away once libxml is removed from this code path. So XPATH() would be the only remaining function which breaks in non-UTF-8 databases. Comments? Thoughts? Suggestions? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers