On Jul23, 2011, at 22:49 , Peter Eisentraut wrote: > On lör, 2011-07-23 at 17:49 +0200, Florian Pflug wrote: >> The current thread about JSON and the ensuing discussion about the >> XML types' behaviour in non-UTF8 databases made me try out how well >> XPATH() copes with that situation. The code, at least, looks >> suspicious - XPATH neither verifies that the server encoding is UTF-8, >> not does it pass the server encoding on to libxml's xpath functions. > > This issue is on the Todo list, and there are some archive links there.
Thanks for the pointer, but I think the discussion there doesn't really apply here. First, I didn't suggest (or implement) full support for XPATH() together with server encodings other than UTF-8. My suggested patch simply closes a hole in the implementation of the current behaviour. Instead of relying on libxml to be able to detect that the encoding isn't UTF-8, it relies on it only to detect that the encoding isn't ASCII. Since supported server encodings are supersets of ASCII, the latter is trivial. xml.c also seems to have changed quite a bite since this was last discussed. Tom Lane argued against the proposed patch on the grounds that there are many more places in xml.c which pass strings to libxml without charset conversion. However, looking at it now, it seems that all XML validation goes through xml_parse(), which actually converts the XML to UTF-8. Only XPATH contains a separate code path, and chooses to ignore encoding issues all together. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers