On 10/25/18 09:56, Alvaro Herrera wrote: > Would you review Markus Winand patch here? > https://postgr.es/m/8bdb0627-2105-4564-aa76-7849f028b...@winand.at > I think doing that would probably point out a couple of ways in which > our XMLTABLE implementation is non-conformant, and then fixes it :-) > I've been unsure as to applying it to all branches since 10 or just to > master.
Well, modulo the key observation that it simply *is not* conformant until it accepts XML Query expressions and uses the XPath 2.0 type system and data model and the SQL/XML 2006+ casting rules ... ... and there is no ISO standard that says anything about how an XPath 1.0- based quasi-XMLTABLE-ish function ought to behave, so it's hard to say that anything this function does is right or wrong, per ISO ... I think all of the changes in these patches do make it a more useful quasi-XMLTABLE-ish function, as the pre-patch behaviors were less useful (if not outright bewildering). And they produce output that better matches what the XQuery-based ISO rules produce (for the subset of queries that mean the same thing as XQuery and as XPath 1.0). I also looked at the (not yet applied?) XML-XPath-comments-processing-instructions-array-ind patch and I think it, too, makes the behavior more useful. I did not actually take the time to build a PostgreSQL with the patch, but I took the two added regression queries, syntax-desugared them[1] and called the Saxon XQuery- based "xmltable" example with them, and got the same expected results: SELECT xmltable.* FROM (SELECT '<root><element>a1a<!-- aaaa -->a2a<?aaaaa pi?> <!--z--> bbbb<x>xxx</x>cccc</element></root>'::xml AS ".") AS p, "xmltable"('/root', PASSING => p, COLUMNS => ARRAY[ 'string(element)', 'string(element/comment()[2])', 'string(element/processing-instruction())', 'string(element/text()[1])', 'serialize(element)']) AS (element text, cmnt text, pi text, t1 text, x text); element | cmnt | pi | t1 | x ----------------------+------+----+-----+--------------------------------------------------------------------------------- a1aa2a bbbbxxxcccc | z | pi | a1a | <element>a1a<!-- aaaa -->a2a<?aaaaa pi?> <!--z--> bbbb<x>xxx</x>cccc</element> (1 row) SELECT xmltable.* FROM (SELECT '<root><element>a1a<!-- aaaa -->a2a<?aaaaa pi?> <!--z--> bbbb<x>xxx</x>cccc</element></root>'::xml AS ".") AS p, "xmltable"('/root', PASSING => p, COLUMNS => ARRAY[ 'string(element/text())', 'string(element/comment()[2])', 'string(element/processing-instruction())', 'string(element/text()[1])', 'serialize(element)']) AS (element text, cmnt text, pi text, t1 text, x text); ERROR: java.sql.SQLException: A sequence of more than one item is not allowed as the first argument of fn:string() (text("a1a"), text("a2a")) Agreement. Agreement is good. :) So I think they are worth applying. I can't bring myself to a strong opinion on whether they are or aren't worth backpatching; if it were the function described by the standard, they'd be bugs and they would be, but does making a non-standard function behave slightly more like the standard function that it isn't count as a bug fix or an enhancement? My overall feeling, at least in directing my own effort, is that I'd rather spend time toward getting the real XQuery-based semantics in place somehow, and ongoing enhancements to the XPath-1.0-based stuff feel more like pouring treasure down a hole. But these enhancements seem like good ones, and if there's interest in patching a couple more, the "unexpected XPath object type {2,3}" in [2] might be good candidates. That would be backpatchable, as the current behavior clearly isn't useful. One other thing: I think the commit message on the context-item patch is really somewhat misleading: "According to the SQL standard, the context of XMLTABLE's XPath row_expression is the document node of the XML input document..." Really the standard says nothing of the sort. It is a limitation of XPath 1.0 that the input even has to be a document at all. In the real XMLTABLE, you could be passing a context item that is a document node (of either 'document' or 'content' flavor), or a one-item 'sequence' holding an atomic type like a number or date, or even a naked XML node like a PI or comment or attribute node you're more used to seeing only inside a document. The real rule is just that the context item is exactly the thing you passed (or a copy of it, when the rules say so). It collapses in the XPath 1.0 case to having to be a document node, simply because that's the only thing you can pass in. What was wrong with the pre-patch code was that it wasn't just using the 'doc' it was given, but actually calling xmlDocGetRootElement() on it and setting the CI to one of its children. The patch just directly assigns doc to xpathctx->node, which I would call correct, not because it's a document node, but because it's the thing that was passed. -Chap [1] You might notice in addition to desugaring the XMLTABLE syntax, I wrapped the text-returning column paths in string(), and wrapped the xml-returning one in serialize() and changed its result column to text. Those changes are just to work around the parts of the Saxon example that aren't implemented yet, as explained in [3]. [2] https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#XMLTABLE [3] https://tada.github.io/pljava/examples/saxon.html#Using_the_Saxon_examples