Michael,
Thanks for your reply. The XSD does not allow mixed content. Attached is my 
test Java code, test xml and test xsd for your reference. 

Also included below is the run log for Xerces parser and for a Sun parser. 
When it runs with the Xerces parser, the whitespaces are reported in the 
characters() method and nothing is reported in ignorablewhitespaces(). 
But when it runs with the Sun parser, the text content is reported in 
characters() and the whitespaces are reported in  ignorablewhitesapces() 
method, as expected.

Joe

------------------------ Log for Xerces parser ---------------------------
factory = org.apache.xerces.jaxp.SAXParserFactoryImpl@110c424
parser = org.apache.xerces.jaxp.SAXParserImpl@1bd2664
startElement howto
characters = "
  "
startElement topic
characters = "
      "
startElement title
characters = "Java"
endElement title
characters = "
      "
...

---------------------- Log for Sun parser ---------------------------------
factory = com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl@1e8a1f6
parser = com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl@1e152c5
startElement howto
ignorableWhitespace = "
  "
startElement topic
ignorableWhitespace = "
      "
startElement title
characters = "Java"
endElement title
ignorableWhitespace = "
      "
...


-----Original Message-----
From: Michael Glavassevich [mailto:mrgla...@ca.ibm.com] 
Sent: Friday, September 12, 2014 9:54 AM
To: j-users@xerces.apache.org
Subject: EXTERNAL: Re: SAX Parser includes ignorable whitespaces in the 
character() method

Your XML document requires a DTD with element declarations which specify that 
they contain element-only content. Without that a SAX parser cannot determine 
which whitespaces are 'ignorable'.

Thanks.

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrgla...@ca.ibm.com
E-mail: mrgla...@apache.org

"Zhu, Joe" <joe....@lmco.com> wrote on 09/11/2014 07:00:11 PM:

> I am writing an app which need to access all text content in XML. 
> According to the ContentHandler API, this could be accomplished by 
> using a validating parser and the characters() method.
> 
> But with the Xerces parser, the characters() method could contain 
> ignorable whitespaces (XML formatting whitespaces). I have no way to 
> tell if the whitespace is ignorable whitespace or is part of the XML
content.
> 
> Has anybody else run into the problem? I tested with both Xerces 2.
> 9.1 and Xerces 2.11. They behave the same way.
> 
> Joe Zhu


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-users-h...@xerces.apache.org

Attachment: SaxIgnorableWhiteSpaceTest.java
Description: SaxIgnorableWhiteSpaceTest.java

<?xml version="1.0" encoding="ISO-8859-1"?>
<howto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>
  <topic>
      <title>Java</title>
      <url>http://www.rgagnon.com/topics/java-xml.html</url>
  </topic>
  <topic>
      <title>PowerBuilder</title>
      <url>http://www.rgagnon.com/topics/pb-powerscript.htm</url>
  </topic>
  <topic>
        <title>Javascript</title>
        <url>http://www.rgagnon.com/topics/js-language.html</url>
  </topic>
  <topic>
        <title>VBScript</title>
        <url>http://www.rgagnon.com/topics/wsh-vbs.html</url>
  </topic>
</howto>
<?xml version="1.0" encoding="ISO-8859-1"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";>

  <xs:element name="howto">
     <xs:complexType>
      <xs:sequence>
        <xs:element name="topic" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="title" type="xs:string"/>
              <xs:element name="url" type="httpURI"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  
  <xs:simpleType name="httpURI">
      <xs:restriction base="xs:anyURI">
        <xs:pattern value="http://.*"; />
      </xs:restriction>
  </xs:simpleType>
 
</xs:schema>
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-users-h...@xerces.apache.org

Reply via email to