2017-08-20 4:17 GMT+02:00 Noah Misch <n...@leadboat.com>: > On Fri, Aug 18, 2017 at 11:43:19PM +0200, Pavel Stehule wrote: > > yes, probably libXML2 try to do check from utf8 encoding to header > > specified encoding. > > Yes. That has been the topic of this thread. > > > a) all values created by xml_in iterface are in database encoding - > input > > string is stored without any change. xml_parse is called only due > > validation. > > > > b) inside xml_parse, the input is converted to UTF8, and document is read > > by xmlCtxtReadDoc with explicitly specified "UTF-8" encoding or > > by xmlParseBalancedChunkMemory with explicitly specified encoding "UTF8" > > and removed decl section. > > > > So for "xml_parse" based functions (xml_in, texttoxml, xml_is_document, > > wellformated_xml) the database encoding is not important > > > > c) xml_recv function does validation by xml_parse and translation to > > database encoding. > > > > Now I don't see a difference between @b and @c - so my hypotheses about > > necessity to use recv interface was wrong. > > Yes. You posted, on 2017-04-05, a test case not requiring the recv > interface. > > On Sat, Aug 19, 2017 at 09:13:50AM +0200, Pavel Stehule wrote: > > I didn't find any info how to enable libXML2 XPath functions for other > > encoding than UTF8 :( ?? > > http://xmlsoft.org/encoding.html is the relevant authority. To > summarize, we > should send only UTF8 to libxml2. >
libxml2 encodes XML to UTF8 by self. All others should be in UTF8. I found some references to xmlSwitchEncoding function - but I didn't find any examples of usage - probably nobody use it. Result is in UTF8 always. > > On Sat, Aug 19, 2017 at 10:53:19PM +0200, Pavel Stehule wrote: > > I am sending some POC - it does support XPATH and XMLTABLE for not UTF8 > > server encoding. > > > > In this case, all strings should be converted to UTF8 before call libXML2 > > functions, and result should be converted back from UTF8. > > Adding support for xpath in non-UTF8 databases is a v11 feature proposal. > Please start a new thread for this, and add it to the open CommitFest. > > In this thread, would you provide the version of your patch that I > described > in my 2017-08-08 post to this thread? That's a back-patchable bug fix. There are three issues: 1. processing 1byte encoding XMLs documents with encoding declaration - should be fixed by ecoding_for_xmlCtxtReadMemory.patch. This patch is very short and safe - can be apply immediately (there is regress tests) 2 encoding issues in XPath specification (and namespaces) - because multibytes chars are not usually used in tag names, this issue hit minimum users. 3. encoding issues in XPath and XMLTABLE results - this is bad issue - the function XMLTABLE will not be functional on non UTF8 databases. Fortunately - there are less users with this encoding, but probably should be apply as fix in 10/11 Postgres. > I found some previous experiments https://marc.info/?l=pgsql- > bugs&m=123407176408688 > > https://wiki.postgresql.org/wiki/Todo#XML links to other background on > this > feature proposal. See Tom Lane's review of a previous patch. Ensure your > patch does not have the problems he found during that review. Do that > before > starting a thread for this feature. > good information - thank you. I'll start new thread for @2 and @3 issues - not sure if I prepare good enough patch for next commit fest - and later commiter can decide if will do backpatching. Regards Pavel > > Thanks, > nm >