pmkwan wrote: > Can someone please explain why the parser is throwing this error: > > xml.sax.SAXParseException: An invalid XML character (Unicode: 0x1e) was > found in the CDATA section. > at > com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown > Source) > at > com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanCDATASection(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown > Source) > > > I am using <?xml version="1.0" encoding="UTF-8"?> in my xml file and I set > my outputStreamWriter to use UTF-8 as well. The data I captured was from > our database and the character set is probably not UTF-8. Does that matter?
Yes, it does matters. > I thought the parser is not supposed to parse anything within the CDATA > section in the xml file. So why would this exception even happened? Bytes are parsed into characters. Characters are then parsed for XML markup. CDATA only inhibits the second of those two processes. i.e., CDATA sections still must contain valid data according to the character set of the document, and furthermore, the characters must fall within the subset of characters permitted in XML. There is no syntax that allows you to embed raw bytes within an XML document. Max.
signature.asc
Description: OpenPGP digital signature