For what it's worth, the deprecated serializer is now fixed [1].

Thanks.

[1] http://marc.info/?l=xerces-cvs&m=119455107025507&w=2

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]

Michael Glavassevich/Toronto/[EMAIL PROTECTED] wrote on 11/08/2007 02:33:55 PM:

> Hi Steve,
>
> Do you have serializer.jar (containing the LSSerializer from Xalan) on
your
> classpath? I can only reproduce this with Xerces' implementation of
> LSSerializer which I might add is also deprecated.
>
> Thanks.
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: [EMAIL PROTECTED]
> E-mail: [EMAIL PROTECTED]
>
> "Steve Carton" <[EMAIL PROTECTED]> wrote on 11/08/2007
> 11:48:16 AM:
>
> > Hi Michael,
> >
> > I've fooled with this in several forms, always with the same
> > results. My current incarnation of the code uses the LSSerializer
> > API. I've also used the (deprecated) XMLSerializer. In either case,
> > I've tried StringWriter, FileWriter, and ByteArrayOutputStream (then
> > to a FileOutputStream to write to a file). I specify UTF-8 as the
> > output encoding. Here's a snippet of the code:
> >
> >       System.setProperty(DOMImplementationRegistry.PROPERTY,"org.
> > apache.xerces.dom.DOMImplementationSourceImpl");
> >       DOMImplementationRegistry registry =
> > DOMImplementationRegistry.newInstance();
> >       DOMImplementation domImpl = registry.getDOMImplementation("LS
> 3.0");
> >       DOMImplementationLS implLS = (DOMImplementationLS)domImpl;
> >       LSSerializer dom3Writer = implLS.createLSSerializer();
> >       LSOutput output=implLS.createLSOutput();
> >       ByteArrayOutputStream bs = new ByteArrayOutputStream();
> >       output.setByteStream(bs);
> >       output.setEncoding("UTF-8");
> >       dom3Writer.write(doc,output);
> >
> > Here's what get's written to a file from that byte stream:
> >
> > <test><div>¦º3 times: ÷ ÷ ÷º¬</div><divCDATA><![CDATA[¦º3
> > times: ]]>&#xf7;<![CDATA[ ]]>&#xf7;<![CDATA[ ]]>&#xf7;<!
> > [CDATA[º¬]]></divCDATA></test>
> >
> > Note that the serialized element that is *not* a cdata section
> > converts the division symbol to UTF-8 without a problem.
> >
> > Steve
> >
> > -----Original Message-----
> > From: Michael Glavassevich [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, November 07, 2007 11:04 PM
> > To: j-users@xerces.apache.org
> > Cc: Steve Carton
> > Subject: Re: Split CDATA Sections and the division Symbol (x00f7)
> >
> > Hi Steve,
> >
> > "Steve Carton" <[EMAIL PROTECTED]> wrote on 11/06/2007
> > 04:10:45 PM:
> >
> > > I'm trying to figure out if this is a bug or not. I created a DOM
with
> > > an element with a CDATA section and I set the value to a String of
> > > characters which include a division symbol (xF7). (I actually do this
> > > by reading the characters in from a file and converting them from
> > > bytes to a String specifying a Windows-1252 encoding.) When I
> > > serialize this DOM out to a String, byte array or anything else, the
> > > CData section is split around the division symbol and the division
> > > symbol is emitted as an entity (&#xF7;). I do try to serialize this
as
> > UTF-8.
> >
> > Some questions ...
> >
> > What API are you using for serialization? Are you specifying an
> > output encoding? What type of output are you writing to? A java.io.
> > OutputStream? A java.io.Writer?
> >
> > > I see in the documentation that this is the correct behavior when the
> > > serializer encounters a Unicode character that isn't recognized; not
> > > sure if this means not recognized in the Unicode (internal) form or
> > > there is no UTF-8 equivalent. But x00F7 seems to be the correct
> > > Unicode value for a division symbol and there is a UTF-8 encoding for
> > > it.  Other "special" characters seem to serialize to UTF-8 without
> > > this split.
> >
> > I think what you meant to say here is "not expressible in the output
> > encoding". For instance ASCII is only capable of representing
> > Unicode code points from 0x00-0x7F. 0xF7 isn't representable in ASCII.
> >
> > > I can send code. I've tried this on the latest Xerces-J. Anyone have
> > > any thoughts about it?
> > >
> > > Thanks,
> > >
> > > Steve Carton
> >
> > Thanks.
> >
> > Michael Glavassevich
> > XML Parser Development
> > IBM Toronto Lab
> > E-mail: [EMAIL PROTECTED]
> > E-mail: [EMAIL PROTECTED]
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to