Perhaps the document should be stored in canonical form. See http://www.w3.org/TR/xml-c14n


I think I agree with Rod's opinion elsewhere in this thread. I guess the "philosophical" question is this: If 2 XML documents with different encodings have the same canonical form, or perhaps produce the same DOM, are they equivalent? Merlin appears to want to say "no", and I think I want to say "yes".

cheers

andrew

Merlin Moncure wrote:

Peter Eisentraut wrote:


The central problem I have is this: How do we deal with the fact that
an XML datum carries its own encoding information?



Maybe I am misunderstanding your question, but IMO postgres should be treating xml documents as if they were binary data, unless the server takes on the role of a parser, in which case it should handle unspecified/unknown encodings just like a normal xml parser would (and this does *not* include changing the encoding!).

According to me, an XML parser should not change one bit of a document,
because that is not a 'parse', but a 'transformation'.



Rewriting the <?xml?> declaration seems like a workable solution, but


it


would break the transparency of the client/server encoding conversion.
Also, some people might dislike that their documents are being changed
as they are stored.



Right, your example begs the question: why does the server care what the encoding of the documents is (perhaps indexing)? ZML validation is a standardized operation which the server (or psql, I suppose) can subcontract out to another application.

Just a side thought: what if the xml encoding type was built into the
domain type itself?
create domain xml_utf8 ...
Which allows casting, etc. which is more natural than an implicit
transformation.

Regards,
Merlin

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend





---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Reply via email to