2016-09-12 8:46 GMT+02:00 Craig Ringer <cr...@2ndquadrant.com>: > On 12 September 2016 at 13:07, Pavel Stehule <pavel.steh...@gmail.com> > wrote: > > >> Out of interest, should the syntax allow room for future expansion to > >> permit reading from file rather than just string literal / column > >> reference? It'd be ideal to avoid reading big documents wholly into > >> memory when using INSERT INTO ... SELECT XMLTABLE (...) . I don't > >> suggest adding that to this patch, just making sure adding it later > >> would not cause problems. > > > > > > this is little bit different question - it is server side function, so > first > > question is - how to push usually client side content to server? Next > > question is how to get this content to a executor. Now only COPY > statement > > is able to do. > > Probably start with support for server-side files. When people are > dealing with really big files they'll be more willing to copy files to > the server or bind them into the server file system over the network. > > The v3 protocol doesn't really allow any way for client-to-server > streaming during a query, I think that's hopeless until we have a > protocol bump. > > > updated patch attached - with your documentation. > > > >> - Can you send the sample data used to generate the example output? > >> I'd like to include at least a cut down part of it in the docs to make > >> it clear how the input correlates with output, and preferably put the > >> whole thing in an appendix. > > > > > > it is in regress tests. > > Makes sense. > > It's not that verbose (for XML) and I wonder if it's just worth > including it in-line in the docs along with the XMLTABLE example. It'd > be much easier to understand how XMLTABLE works and what it does then. > > >> - What effect does xmlnamespaces clause have? Does supplying it allow > >> you to reference qualified names in xpath? What happens if you don't > >> specify it for a document that has namespaces or don't define all the > >> namespaces? What if you reference an undefined namespace in xpath? > >> What about if an undefined namespace isn't referenced by xpath, but is > >> inside a node selected by an xpath expression? > > > > > > All this is under libxml2 control - when you use undefined namespace, > then > > libxml2 raises a error. The namespaces in document and in XPath queries > are > > absolutely independent - the relation is a URI. When you use bad URI > > (referenced by name), then the result will be empty set. When you use > > undefined name, then you will get a error. > > OK, makes sense. > > >> - What are the rules for converting the matched XML node into a value? > >> If the matched node is not a simple text node or lacks a text node as > >> its single child, what happens? > > > > This process is described and controlled by "XML SQL mapping". The > Postgres > > has minimalistic implementation without possibility of external control > and > > without schema support. The my implementation is simple. When user > doesn't > > specify result target like explicit using of text() function, then the > > text() function is used implicitly when target type is not XML. Then I > dump > > result to string and I enforce related input functions for target types. > > OK, so a subset of the full spec functionality is provided because of > limitations in Pg and libxml2. Makes sense. > > My only big concern here is that use of text() is a common mistake in > XSLT, and I think the same thing will happen here. Users expect > comments to be ignored, but in fact a comment inserts a comment node > into the XML DOM, so a comment between two pieces of text produces a > tree of > > element > text() > comment() > text() > > > If you match element/text(), you get a 2-node result and will presumably > ERROR. > > There is no good way to tell this from > > element > text() > element > text() > > when you use an xpath expression like element/text() . So you can't > safely solve it just by concatenating all resulting text() nodes > without surprising behaviour. > > >> - What happens if the matched has multiple text node children? This > >> can happen if, for example, you have something like > >> > >> <matchedNode> > >> some text <!-- comment splits up text node --> other text > >> </matchedNode> > > >
I fixed this case - new regress tests added > > > > depends on target type - it is allowed in XML, and it is disallowed for > > other types. I though about support of a arrays - but the patch will be > much > > more complex - there can be recursion - so I disallowed it. When the user > > have to solve this issue, then he can use nested XMLTABLE functions and > > nested function is working with XML type. > > I don't really understand how that'd work. > > Do you know how other implementations handle this? > > I think users are going to be VERY surprised when comments in text > break their XML. > > >> - Is there a way to get an attribute as a value? If so, an example > >> should show this because it's going to be a common need. Presumably > >> you want node/@attrname ? > > > > > > you can use reference to current node "." - so "./@attname" should to > work - > > a example is in regress tests > > cool, just needs mention in docs then. > > >> - PASSING clause isn't really defined. You can specify one PASSING > >> entry as a literal/colref/expression, and it's the argument xml > >> document, right? The external docs you referred to say that PASSING > >> may have a BY VALUE keyword, alias its argument with AS, and may have > >> expressions, e.g. > >> > >> PASSING BY VALUE '<x/>' AS a, '<y/>' AS b > >> > >> Neither aliases nor multiple entries are supported by the code or > >> grammar. Should this be documented as a restriction? > > > > > > The ANSI allows to pass more documents - and then do complex queries with > > XQuery. Passing more than one document has not sense in libxml2 based > > implementation, so I didn't supported it. The referenced names can be > > implemented later - but it needs to changes in XPATH function too. > > OK, so my docs addition that just says they're not supported should be > fine. > > >> - What does BY REF mean? Should this just be mentioned with a "see > >> xmlexists(...)" since it seems to be compatibility noise? Is there a > >> corresponding BY VALUE or similar? > > > > When the XML document is stored as serialized DOM, then by ref means > link on > > this DOM. It has not sense in Postgres - because we store XML documents > by > > value only. > > Right. And since there's already precent for xmlexists there's no > point worrying about whether we lose opportunities to implement it > later. > > >> - Why do the expression arguments take c_expr (anything allowed in > >> a_expr or b_expr), not b_expr (restricted expression) ? > > > > > > I don't know - I expect the problems with parser - because PASSING is > > restricted keyword in ANSI/SQL and unreserved keyword in Postgres. > > I mean for the rowpath argument, not the parts within > xmlexists_argument. If the rowpath doesn't need to be c_expr > presumably it should be a b_expr or even, if it doesn't cause parsing > ambiguities, an a_expr ? There doesn't seem to be the same issue here > as we have with BETWEEN etc. > b_expr enforces shift/reduce conflict :( > > >> - Column definitions are underdocumented. The grammar says they can be > >> NOT NULL, for example, but I don't see that in any of the references > >> you mailed me nor in the docs. What behaviour is expected for a NOT > >> NULL column? I've documented my best guess (not checked closely > >> against the code, please verify). > >> > > > > yes - some other databases allows it - I am thinking so it is useful. > > Sure. Sounds like my docs additions are probably right then, except > for incorrect description of DEFAULT. > I found other opened question - how we can translate empty tag to SQL value? The Oracle should not to solve this question, but PostgreSQL does. Some databases returns empty string. I prefer return a empty string - not null in this case. The reason is simple - Empty string is some information - and NULL is less information. When it is necessary I can transform empty string to NULL - different direction is not unique. Regards Pavel > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
xmltable-6.patch.gz
Description: GNU Zip compressed data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers