Hi Steve, "Steve Carton" <[EMAIL PROTECTED]> wrote on 11/06/2007 04:10:45 PM:
> I'm trying to figure out if this is a bug or not. I created a DOM > with an element with a CDATA section and I set the value to a String > of characters which include a division symbol (xF7). (I actually do > this by reading the characters in from a file and converting them > from bytes to a String specifying a Windows-1252 encoding.) When I > serialize this DOM out to a String, byte array or anything else, the > CData section is split around the division symbol and the division > symbol is emitted as an entity (÷). I do try to serialize this as UTF-8. Some questions ... What API are you using for serialization? Are you specifying an output encoding? What type of output are you writing to? A java.io.OutputStream? A java.io.Writer? > I see in the documentation that this is the correct behavior when > the serializer encounters a Unicode character that isn't recognized; > not sure if this means not recognized in the Unicode (internal) form > or there is no UTF-8 equivalent. But x00F7 seems to be the correct > Unicode value for a division symbol and there is a UTF-8 encoding > for it. Other "special" characters seem to serialize to UTF-8 > without this split. I think what you meant to say here is "not expressible in the output encoding". For instance ASCII is only capable of representing Unicode code points from 0x00-0x7F. 0xF7 isn't representable in ASCII. > I can send code. I've tried this on the latest Xerces-J. Anyone have > any thoughts about it? > > Thanks, > > Steve Carton Thanks. Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]