Merlin Moncure kirjutas N, 15.01.2004 kell 18:43: > Hannu Krosing wrote:
> > select > > '<d/>'::xml == '<?xml version="1.0" encoding="utf-8"?>\n<d/>\n'::xml > > Right: I understand your reasoning here. Here is the trick: > > select '[...]'::xml introduces a casting step which justifies a > transformation. The original input data is not xml, but varchar. Since > there are no arbitrary rules on how to do this, we have some flexibility > here to do things like change the encoding/mess with the whitespace. I > am trying to find away to break the assumption that my xml data > necessarily has to be converted from raw text. > > My basic point is that we are confusing the roles of storing and > parsing/transformation. The question is: are we storing xml documents > or the metadata that makes up xml documents? We need to be absolutely > clear on which role the server takes on...in fact both roles may be > appropriate for different situations, but should be represented by a > different type. I'll try and give examples of both situations. > > If we are strictly storing documents, IMO the server should perform zero > modification on the document. Validation could be applied conceptually > as a constraint (and, possibly XSLT/XPATH to allow a fancy type of > indexing). However there is no advantage that I can see to manipulating > the document except to break the 'C' of ACID. My earlier comments wrt > binary encoding is that there simply has to be a way to prevent the > server mucking with my document. > > For example, if I was using postgres to store XML-EDI documents in a DX > system this is the role I would prefer. Validation and indexing are > useful, but my expected use of the server is a type of electronic xerox > of the incoming document. I would be highly suspicious of any > modification the server made to my document for any reason. The current charset/encoding support can be evil in some cases ;( The only solution seems to be keeping both server and client encoding as ASCII (or just disable it) The proper path to encodings must unfortunately do the encoding conversions *after* parsing, when it is known, which parts of the original query string should be changed. Or, as you suggested, always encode anything outside plain ASCII (n<32 and n>127), both on input (can be done client-side) and output (IIRC needs another type with different output function) > Based on your suggestions I think you are primarily concerned with the > second example. However, in my work I do a lot of DX and I see the xml > document as a binary object. Server-side validation would be extremely > helpful, but please don't change my document! So the problem is not exactly XML, but rather problems with changing encodings of "binary" strings that should not be changed. I hope (but I'm not sure) that keeping client and server encodings the same should prevent that. > So, I submit that we are both right for different reasons. Seems so. ----------------- Hannu ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match