Re: [HACKERS] Encoding problems in PostgreSQL with XML data

2004-01-20 Thread Tatsuo Ishii
> Merlin Moncure kirjutas E, 12.01.2004 kell 19:56: > > Hannu Krosing wrote: > > > IIRC, the charset transformations are done as a separate step in the > > > wire protocol _before_ any parser has chance transform or not. > > > > Yep. My point is that this is wrong. > > Of course :) We need th

Re: [HACKERS] Encoding problems in PostgreSQL with XML data

2004-01-15 Thread Hannu Krosing
Merlin Moncure kirjutas N, 15.01.2004 kell 18:43: > Hannu Krosing wrote: > > select > > ''::xml == '\n\n'::xml > > Right: I understand your reasoning here. Here is the trick: > > select '[...]'::xml introduces a casting step which justifies a > transformation. The original input data is not xm

Re: [HACKERS] Encoding problems in PostgreSQL with XML data

2004-01-15 Thread Merlin Moncure
Hannu Krosing wrote: > > In that case, treat the XML document like a binary stream, using > > PQescapeBytea, etc. to encode if necessary pre-query. Also, the XML > > domain should inherit from bytea, not varchar. > > why ? > > the allowed characters repertoire in XML is even less than in varcha

Re: [HACKERS] Encoding problems in PostgreSQL with XML data

2004-01-15 Thread Hannu Krosing
Merlin Moncure kirjutas K, 14.01.2004 kell 15:49: > Hannu Krosing wrote: > > I hope that real as-needed-column-by-column translation will be used > > with bound argument queries. > > > > It also seems possible to delegate the encoding changes to after the > > query is parsed, but this will never w

Re: [HACKERS] Encoding problems in PostgreSQL with XML data

2004-01-14 Thread Merlin Moncure
Hannu Krosing wrote: > I hope that real as-needed-column-by-column translation will be used > with bound argument queries. > > It also seems possible to delegate the encoding changes to after the > query is parsed, but this will never work for EBCDIC and other funny > encodings (like rot13 ;). >

Re: [HACKERS] Encoding problems in PostgreSQL with XML data

2004-01-13 Thread Hannu Krosing
Merlin Moncure kirjutas E, 12.01.2004 kell 19:56: > Hannu Krosing wrote: > > IIRC, the charset transformations are done as a separate step in the > > wire protocol _before_ any parser has chance transform or not. > > Yep. My point is that this is wrong. Of course :) It seems to be a quick hac

Re: [HACKERS] Encoding problems in PostgreSQL with XML data

2004-01-12 Thread Merlin Moncure
Hannu Krosing wrote: > IIRC, the charset transformations are done as a separate step in the > wire protocol _before_ any parser has chance transform or not. Yep. My point is that this is wrong. I think of XML the same way I think of a zip file contains a text document. Postgres does not unzip a

Re: [HACKERS] Encoding problems in PostgreSQL with XML data

2004-01-10 Thread Hannu Krosing
Merlin Moncure kirjutas R, 09.01.2004 kell 22:04: > Peter Eisentraut wrote: > > The central problem I have is this: How do we deal with the fact that > > an XML datum carries its own encoding information? > > Maybe I am misunderstanding your question, but IMO postgres should be > treating xml doc

Re: [HACKERS] Encoding problems in PostgreSQL with XML data

2004-01-09 Thread Andrew Dunstan
Peter Eisentraut wrote: Andrew Dunstan wrote: Perhaps the document should be stored in canonical form. That kills the DTD, the id attributes, thus crippling XPath, and it looks horrible on output. I don't think that can be accepted. Canonical form is useful for comparing documents, b

Re: [HACKERS] Encoding problems in PostgreSQL with XML data

2004-01-09 Thread Peter Eisentraut
Andrew Dunstan wrote: > Perhaps the document should be stored in canonical form. That kills the DTD, the id attributes, thus crippling XPath, and it looks horrible on output. I don't think that can be accepted. Canonical form is useful for comparing documents, but not for operating on them, I

Re: [HACKERS] Encoding problems in PostgreSQL with XML data

2004-01-09 Thread Merlin Moncure
Andrew Dunstan wrote: > I think I agree with Rod's opinion elsewhere in this thread. I guess the > "philosophical" question is this: If 2 XML documents with different > encodings have the same canonical form, or perhaps produce the same DOM, > are they equivalent? Merlin appears to want to say "no"

Re: [HACKERS] Encoding problems in PostgreSQL with XML data

2004-01-09 Thread Andrew Dunstan
Perhaps the document should be stored in canonical form. See http://www.w3.org/TR/xml-c14n I think I agree with Rod's opinion elsewhere in this thread. I guess the "philosophical" question is this: If 2 XML documents with different encodings have the same canonical form, or perhaps produce the

Re: [HACKERS] Encoding problems in PostgreSQL with XML data

2004-01-09 Thread Merlin Moncure
Peter Eisentraut wrote: > The central problem I have is this: How do we deal with the fact that > an XML datum carries its own encoding information? Maybe I am misunderstanding your question, but IMO postgres should be treating xml documents as if they were binary data, unless the server takes on

Re: [HACKERS] Encoding problems in PostgreSQL with XML data

2004-01-09 Thread Rod Taylor
> Rewriting the declaration seems like a workable solution, but it > would break the transparency of the client/server encoding conversion. > Also, some people might dislike that their documents are being changed > as they are stored. I presume that the XML type stores the textual representat

[HACKERS] Encoding problems in PostgreSQL with XML data

2004-01-09 Thread Peter Eisentraut
This is not directly related to current development, but it is something that might need a low-level solution. I've been thinking for some time about how to enchance the current "XML support" (e.g., contrib/xml). The central problem I have is this: How do we deal with the fact that an XML dat