Hi Steve,

"Steve Carton" <[EMAIL PROTECTED]> wrote on 11/06/2007
04:10:45 PM:

> I'm trying to figure out if this is a bug or not. I created a DOM
> with an element with a CDATA section and I set the value to a String
> of characters which include a division symbol (xF7). (I actually do
> this by reading the characters in from a file and converting them
> from bytes to a String specifying a Windows-1252 encoding.) When I
> serialize this DOM out to a String, byte array or anything else, the
> CData section is split around the division symbol and the division
> symbol is emitted as an entity (&#xF7;). I do try to serialize this as
UTF-8.

Some questions ...

What API are you using for serialization? Are you specifying an output
encoding? What type of output are you writing to? A java.io.OutputStream? A
java.io.Writer?

> I see in the documentation that this is the correct behavior when
> the serializer encounters a Unicode character that isn't recognized;
> not sure if this means not recognized in the Unicode (internal) form
> or there is no UTF-8 equivalent. But x00F7 seems to be the correct
> Unicode value for a division symbol and there is a UTF-8 encoding
> for it.  Other "special" characters seem to serialize to UTF-8
> without this split.

I think what you meant to say here is "not expressible in the output
encoding". For instance ASCII is only capable of representing Unicode code
points from 0x00-0x7F. 0xF7 isn't representable in ASCII.

> I can send code. I've tried this on the latest Xerces-J. Anyone have
> any thoughts about it?
>
> Thanks,
>
> Steve Carton

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to