Hi Pavel On 25.08.24 20:57, Pavel Stehule wrote: > > There is unwanted white space in the patch > > -<-><--><-->xmlFreeDoc(doc); > +<->else if (format == XMLSERIALIZE_CANONICAL || format == > XMLSERIALIZE_CANONICAL_WITH_NO_COMMENTS) > + <>{ > +<-><-->xmlChar *xmlbuf = NULL; > +<-><-->int nbytes; > +<-><-->int > I missed that one. Just removed it, thanks! > 1. the xml is serialized to UTF8 string every time, but when target > type is varchar or text, then it should be every time encoded to > database encoding. Is not possible to hold utf8 string in latin2 > database varchar. I'm calling xml_parse using GetDatabaseEncoding(), so I thought I would be on the safe side
if(format ==XMLSERIALIZE_CANONICAL ||format ==XMLSERIALIZE_CANONICAL_WITH_NO_COMMENTS) doc =xml_parse(data, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL, NULL, NULL); ... or you mean something else? > 2. The proposed feature can increase some confusion in implementation > of NO IDENT. I am not an expert on this area, so I checked other > databases. DB2 does not have anything similar. But Oracle's "NO IDENT" > clause is very similar to the proposed "CANONICAL". Unfortunately, > there is different behaviour of NO IDENT - Oracle's really removes > formatting, Postgres does nothing. Coincidentally, the [NO] INDENT support for xmlserialize is an old patch of mine. It basically "does nothing" and prints the xml as is, e.g. SELECT xmlserialize(DOCUMENT '<foo><bar><val z="1" a="8"><![CDATA[0&1]]></val></bar></foo>' AS text INDENT); xmlserialize -------------------------------------------- <foo> + <bar> + <val z="1" a="8"><![CDATA[0&1]]></val>+ </bar> + </foo> + (1 row) SELECT xmlserialize(DOCUMENT '<foo><bar><val z="1" a="8"><![CDATA[0&1]]></val></bar></foo>' AS text NO INDENT); xmlserialize -------------------------------------------------------------- <foo><bar><val z="1" a="8"><![CDATA[0&1]]></val></bar></foo> (1 row) SELECT xmlserialize(DOCUMENT '<foo><bar><val z="1" a="8"><![CDATA[0&1]]></val></bar></foo>' AS text); xmlserialize -------------------------------------------------------------- <foo><bar><val z="1" a="8"><![CDATA[0&1]]></val></bar></foo> (1 row) .. while CANONICAL converts the xml to its canonical form,[1,2] e.g. sorting attributes and replacing CDATA strings by its value: SELECT xmlserialize(DOCUMENT '<foo><bar><val z="1" a="8"><![CDATA[0&1]]></val></bar></foo>' AS text CANONICAL); xmlserialize ------------------------------------------------------ <foo><bar><val a="8" z="1">0&1</val></bar></foo> (1 row) xmlserialize CANONICAL does not exist in any other database and it's not part of the SQL/XML standard. Regarding the different behaviour of NO INDENT in Oracle and PostgreSQL: it is not entirely clear to me if SQL/XML states that NO INDENT must remove the indentation from xml strings. It says: "INDENT — the choice of whether to “pretty-print” the serialized XML by means of indentation, either True or False. .... i) If <XML serialize indent> is specified and does not contain NO, then let IND be True. ii) Otherwise, let IND be False." When I wrote the patch I assumed it meant to leave the xml as is .. but I might be wrong. Perhaps it would be best if we open a new thread for this topic. Thank you for reviewing this patch. Much appreciated! Best, -- Jim 1 - https://www.w3.org/TR/xml-c14n11/ 2 - https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-c14n.html