Hi Steve,

Do you have serializer.jar (containing the LSSerializer from Xalan) on your
classpath? I can only reproduce this with Xerces' implementation of
LSSerializer which I might add is also deprecated.

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]

"Steve Carton" <[EMAIL PROTECTED]> wrote on 11/08/2007
11:48:16 AM:

> Hi Michael,
>
> I've fooled with this in several forms, always with the same
> results. My current incarnation of the code uses the LSSerializer
> API. I've also used the (deprecated) XMLSerializer. In either case,
> I've tried StringWriter, FileWriter, and ByteArrayOutputStream (then
> to a FileOutputStream to write to a file). I specify UTF-8 as the
> output encoding. Here's a snippet of the code:
>
>       System.setProperty(DOMImplementationRegistry.PROPERTY,"org.
> apache.xerces.dom.DOMImplementationSourceImpl");
>       DOMImplementationRegistry registry =
> DOMImplementationRegistry.newInstance();
>       DOMImplementation domImpl = registry.getDOMImplementation("LS
3.0");
>       DOMImplementationLS implLS = (DOMImplementationLS)domImpl;
>       LSSerializer dom3Writer = implLS.createLSSerializer();
>       LSOutput output=implLS.createLSOutput();
>       ByteArrayOutputStream bs = new ByteArrayOutputStream();
>       output.setByteStream(bs);
>       output.setEncoding("UTF-8");
>       dom3Writer.write(doc,output);
>
> Here's what get's written to a file from that byte stream:
>
> <test><div>¦º3 times: ÷ ÷ ÷º¬</div><divCDATA><![CDATA[¦º3
> times: ]]>&#xf7;<![CDATA[ ]]>&#xf7;<![CDATA[ ]]>&#xf7;<!
> [CDATA[º¬]]></divCDATA></test>
>
> Note that the serialized element that is *not* a cdata section
> converts the division symbol to UTF-8 without a problem.
>
> Steve
>
> -----Original Message-----
> From: Michael Glavassevich [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, November 07, 2007 11:04 PM
> To: j-users@xerces.apache.org
> Cc: Steve Carton
> Subject: Re: Split CDATA Sections and the division Symbol (x00f7)
>
> Hi Steve,
>
> "Steve Carton" <[EMAIL PROTECTED]> wrote on 11/06/2007
> 04:10:45 PM:
>
> > I'm trying to figure out if this is a bug or not. I created a DOM with
> > an element with a CDATA section and I set the value to a String of
> > characters which include a division symbol (xF7). (I actually do this
> > by reading the characters in from a file and converting them from
> > bytes to a String specifying a Windows-1252 encoding.) When I
> > serialize this DOM out to a String, byte array or anything else, the
> > CData section is split around the division symbol and the division
> > symbol is emitted as an entity (&#xF7;). I do try to serialize this as
> UTF-8.
>
> Some questions ...
>
> What API are you using for serialization? Are you specifying an
> output encoding? What type of output are you writing to? A java.io.
> OutputStream? A java.io.Writer?
>
> > I see in the documentation that this is the correct behavior when the
> > serializer encounters a Unicode character that isn't recognized; not
> > sure if this means not recognized in the Unicode (internal) form or
> > there is no UTF-8 equivalent. But x00F7 seems to be the correct
> > Unicode value for a division symbol and there is a UTF-8 encoding for
> > it.  Other "special" characters seem to serialize to UTF-8 without
> > this split.
>
> I think what you meant to say here is "not expressible in the output
> encoding". For instance ASCII is only capable of representing
> Unicode code points from 0x00-0x7F. 0xF7 isn't representable in ASCII.
>
> > I can send code. I've tried this on the latest Xerces-J. Anyone have
> > any thoughts about it?
> >
> > Thanks,
> >
> > Steve Carton
>
> Thanks.
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: [EMAIL PROTECTED]
> E-mail: [EMAIL PROTECTED]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to