Re: [PATCH] Add CANONICAL option to xmlserialize

Pavel Stehule Thu, 29 Aug 2024 11:51:32 -0700

út 27. 8. 2024 v 13:57 odesílatel Jim Jones <jim.jo...@uni-muenster.de>
napsal:


>
>
> On 26.08.24 16:59, Pavel Stehule wrote:
> >
> > 1. what about behaviour of NO INDENT - the implementation is not too
> > old, so it can be changed if we want (I think), and it is better to do
> > early than too late
>
> While checking the feasibility of removing indentation with NO INDENT I
> may have found a bug in XMLSERIALIZE ... INDENT.
> xmlSaveToBuffer seems to ignore elements if there are whitespaces
> between them:
>
> SELECT xmlserialize(DOCUMENT '<foo><bar>42</bar></foo>' AS text INDENT);
>   xmlserialize
> -----------------
>  <foo>          +
>    <bar>42</bar>+
>  </foo>         +
>
> (1 row)
>
> SELECT xmlserialize(DOCUMENT '<foo> <bar>42</bar> </foo>'::xml AS text
> INDENT);
>         xmlserialize
> ----------------------------
>  <foo> <bar>42</bar> </foo>+
>
> (1 row)
>
> I'll take a look at it.
>

+1


> Regarding removing indentation: yes, it would be possible with libxml2.
> The question is if it would be right to do so.
> > 2. Are we able to implement SQL/XML syntax with libxml2?
> >
> > 3. Are we able to implement Oracle syntax with libxml2? And there are
> > benefits other than higher possible compatibility?
> I guess it would be beneficial if you're migrating from oracle to
> postgres - or the other way around. It certainly wouldn't hurt, but so
> far I personally had little use for the oracle's extra xmlserialize
> features.
> >
> > 4. Can there be some possible collision (functionality, syntax) with
> > CANONICAL?
> I couldn't find anything in the SQL/XML spec that might refer to
> canonocal xml.
> >
> > 5. SQL/XML XMLSERIALIZE supports other target types than varchar. I
> > can imagine XMLSERIALIZE with CANONICAL to bytea (then we don't need
> > to force database encoding). Does it make sense? Are the results
> > comparable?
> |
> As of pg16 bytea is not supported. Currently type| can be |character|,
> |character varying|, or |text - also their other flavours like 'name'.
>

I know, but theoretically, there can be some benefit for CANONICAL if pg
supports bytea there. Lot of databases still use non utf8 encoding.

It is a more theoretical question - if pg supports different types there in
future  (because SQL/XML or Oracle), then CANONICAL can be used without
limit, or CANONICAL can be used just for text? And you are sure, so you can
compare text X text, instead xml X xml?

+SELECT xmlserialize(CONTENT doc AS text CANONICAL) = xmlserialize(CONTENT
doc AS text CANONICAL WITH COMMENTS) FROM xmltest_serialize;
+ ?column?
+----------
+ t
+ t
+(2 rows)

Maybe I am a little bit confused by these regress tests, because at the end
it is not too useful - you compare two identical XML, and WITH COMMENTS and
WITHOUT COMMENTS is tested elsewhere. I tried to search for a sense of this
test.  Better to use really different documents (columns) instead.

Regards

Pavel


>
> |
>
> --
> Jim
>
>

Re: [PATCH] Add CANONICAL option to xmlserialize

Reply via email to