pmkwan wrote:
> Can someone please explain why the parser is throwing this error:
> 
> xml.sax.SAXParseException: An invalid XML character (Unicode: 0x1e) was
> found in the CDATA section.
>       at
> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown
> Source)
>       at
> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown
> Source)
>       at
> com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown
> Source)
>       at
> com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown
> Source)
>       at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanCDATASection(Unknown
> Source)
>       at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown
> Source)
> 
> 
> I am using <?xml version="1.0" encoding="UTF-8"?> in my xml file and I set
> my outputStreamWriter to use UTF-8 as well.  The data I captured was from
> our database and the character set is probably not UTF-8.  Does that matter?

Yes, it does matters.

> I thought the parser is not supposed to parse anything within the CDATA
> section in the xml file.  So why would this exception even happened?

Bytes are parsed into characters. Characters are then parsed for XML
markup. CDATA only inhibits the second of those two processes.

i.e., CDATA sections still must contain valid data according to the
character set of the document, and furthermore, the characters must fall
within the subset of characters permitted in XML.

There is no syntax that allows you to embed raw bytes within an XML
document.

Max.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to