For what it's worth, the deprecated serializer is now fixed [1]. Thanks.
[1] http://marc.info/?l=xerces-cvs&m=119455107025507&w=2 Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED] Michael Glavassevich/Toronto/[EMAIL PROTECTED] wrote on 11/08/2007 02:33:55 PM: > Hi Steve, > > Do you have serializer.jar (containing the LSSerializer from Xalan) on your > classpath? I can only reproduce this with Xerces' implementation of > LSSerializer which I might add is also deprecated. > > Thanks. > > Michael Glavassevich > XML Parser Development > IBM Toronto Lab > E-mail: [EMAIL PROTECTED] > E-mail: [EMAIL PROTECTED] > > "Steve Carton" <[EMAIL PROTECTED]> wrote on 11/08/2007 > 11:48:16 AM: > > > Hi Michael, > > > > I've fooled with this in several forms, always with the same > > results. My current incarnation of the code uses the LSSerializer > > API. I've also used the (deprecated) XMLSerializer. In either case, > > I've tried StringWriter, FileWriter, and ByteArrayOutputStream (then > > to a FileOutputStream to write to a file). I specify UTF-8 as the > > output encoding. Here's a snippet of the code: > > > > System.setProperty(DOMImplementationRegistry.PROPERTY,"org. > > apache.xerces.dom.DOMImplementationSourceImpl"); > > DOMImplementationRegistry registry = > > DOMImplementationRegistry.newInstance(); > > DOMImplementation domImpl = registry.getDOMImplementation("LS > 3.0"); > > DOMImplementationLS implLS = (DOMImplementationLS)domImpl; > > LSSerializer dom3Writer = implLS.createLSSerializer(); > > LSOutput output=implLS.createLSOutput(); > > ByteArrayOutputStream bs = new ByteArrayOutputStream(); > > output.setByteStream(bs); > > output.setEncoding("UTF-8"); > > dom3Writer.write(doc,output); > > > > Here's what get's written to a file from that byte stream: > > > > <test><div>¦º3 times: ÷ ÷ ÷º¬</div><divCDATA><![CDATA[¦º3 > > times: ]]>÷<![CDATA[ ]]>÷<![CDATA[ ]]>÷<! > > [CDATA[º¬]]></divCDATA></test> > > > > Note that the serialized element that is *not* a cdata section > > converts the division symbol to UTF-8 without a problem. > > > > Steve > > > > -----Original Message----- > > From: Michael Glavassevich [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, November 07, 2007 11:04 PM > > To: j-users@xerces.apache.org > > Cc: Steve Carton > > Subject: Re: Split CDATA Sections and the division Symbol (x00f7) > > > > Hi Steve, > > > > "Steve Carton" <[EMAIL PROTECTED]> wrote on 11/06/2007 > > 04:10:45 PM: > > > > > I'm trying to figure out if this is a bug or not. I created a DOM with > > > an element with a CDATA section and I set the value to a String of > > > characters which include a division symbol (xF7). (I actually do this > > > by reading the characters in from a file and converting them from > > > bytes to a String specifying a Windows-1252 encoding.) When I > > > serialize this DOM out to a String, byte array or anything else, the > > > CData section is split around the division symbol and the division > > > symbol is emitted as an entity (÷). I do try to serialize this as > > UTF-8. > > > > Some questions ... > > > > What API are you using for serialization? Are you specifying an > > output encoding? What type of output are you writing to? A java.io. > > OutputStream? A java.io.Writer? > > > > > I see in the documentation that this is the correct behavior when the > > > serializer encounters a Unicode character that isn't recognized; not > > > sure if this means not recognized in the Unicode (internal) form or > > > there is no UTF-8 equivalent. But x00F7 seems to be the correct > > > Unicode value for a division symbol and there is a UTF-8 encoding for > > > it. Other "special" characters seem to serialize to UTF-8 without > > > this split. > > > > I think what you meant to say here is "not expressible in the output > > encoding". For instance ASCII is only capable of representing > > Unicode code points from 0x00-0x7F. 0xF7 isn't representable in ASCII. > > > > > I can send code. I've tried this on the latest Xerces-J. Anyone have > > > any thoughts about it? > > > > > > Thanks, > > > > > > Steve Carton > > > > Thanks. > > > > Michael Glavassevich > > XML Parser Development > > IBM Toronto Lab > > E-mail: [EMAIL PROTECTED] > > E-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]