Michael, Thanks for your reply. The XSD does not allow mixed content. Attached is my test Java code, test xml and test xsd for your reference.
Also included below is the run log for Xerces parser and for a Sun parser. When it runs with the Xerces parser, the whitespaces are reported in the characters() method and nothing is reported in ignorablewhitespaces(). But when it runs with the Sun parser, the text content is reported in characters() and the whitespaces are reported in ignorablewhitesapces() method, as expected. Joe ------------------------ Log for Xerces parser --------------------------- factory = org.apache.xerces.jaxp.SAXParserFactoryImpl@110c424 parser = org.apache.xerces.jaxp.SAXParserImpl@1bd2664 startElement howto characters = " " startElement topic characters = " " startElement title characters = "Java" endElement title characters = " " ... ---------------------- Log for Sun parser --------------------------------- factory = com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl@1e8a1f6 parser = com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl@1e152c5 startElement howto ignorableWhitespace = " " startElement topic ignorableWhitespace = " " startElement title characters = "Java" endElement title ignorableWhitespace = " " ... -----Original Message----- From: Michael Glavassevich [mailto:mrgla...@ca.ibm.com] Sent: Friday, September 12, 2014 9:54 AM To: j-users@xerces.apache.org Subject: EXTERNAL: Re: SAX Parser includes ignorable whitespaces in the character() method Your XML document requires a DTD with element declarations which specify that they contain element-only content. Without that a SAX parser cannot determine which whitespaces are 'ignorable'. Thanks. Michael Glavassevich XML Technologies and WAS Development IBM Toronto Lab E-mail: mrgla...@ca.ibm.com E-mail: mrgla...@apache.org "Zhu, Joe" <joe....@lmco.com> wrote on 09/11/2014 07:00:11 PM: > I am writing an app which need to access all text content in XML. > According to the ContentHandler API, this could be accomplished by > using a validating parser and the characters() method. > > But with the Xerces parser, the characters() method could contain > ignorable whitespaces (XML formatting whitespaces). I have no way to > tell if the whitespace is ignorable whitespace or is part of the XML content. > > Has anybody else run into the problem? I tested with both Xerces 2. > 9.1 and Xerces 2.11. They behave the same way. > > Joe Zhu --------------------------------------------------------------------- To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org For additional commands, e-mail: j-users-h...@xerces.apache.org
SaxIgnorableWhiteSpaceTest.java
Description: SaxIgnorableWhiteSpaceTest.java
<?xml version="1.0" encoding="ISO-8859-1"?> <howto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <topic> <title>Java</title> <url>http://www.rgagnon.com/topics/java-xml.html</url> </topic> <topic> <title>PowerBuilder</title> <url>http://www.rgagnon.com/topics/pb-powerscript.htm</url> </topic> <topic> <title>Javascript</title> <url>http://www.rgagnon.com/topics/js-language.html</url> </topic> <topic> <title>VBScript</title> <url>http://www.rgagnon.com/topics/wsh-vbs.html</url> </topic> </howto>
<?xml version="1.0" encoding="ISO-8859-1"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="howto"> <xs:complexType> <xs:sequence> <xs:element name="topic" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="url" type="httpURI"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:simpleType name="httpURI"> <xs:restriction base="xs:anyURI"> <xs:pattern value="http://.*" /> </xs:restriction> </xs:simpleType> </xs:schema>
--------------------------------------------------------------------- To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org For additional commands, e-mail: j-users-h...@xerces.apache.org