Hmm. Why does it distinguish between DTD and XSD? They are all schema 
definitions!

It is useful to be able to distinguish between ignorable whitespaces and 
allowable whitespaces. Xerces can't, which makes it less useful . It also 
violates the W3C XML Recommendation, as shown below.

Everywhere I read, it implies that ignorable whitespaces shall be reported in 
ignorableWhitespace():

------------------------ org.sax.xml.ContentHandler API 
(http://docs.oracle.com/javase/6/docs/api/index.html?javax/xml/stream/package-summary.html)
 ----------------

ignorableWhitespace
---
Validating Parsers must use this method to report each chunk of whitespace in 
element content (see the W3C XML 1.0 recommendation, section 2.10): 
non-validating parsers may also use this method if they are capable of parsing 
and using content models.

--------------------- W3C XML Recommendation  
(http://www.w3.org/TR/REC-xml/#sec-white-space) 
-----------------------------------------------------------------------
2.10 White Space Handling
...
An XML processor MUST always pass all characters in a document that are not 
markup through to the application. A validating XML processor MUST also inform 
the application which of these characters constitute white space appearing in 
element content
---------------------------------------------------------------------------------------------------------------------------

Joe 

-----Original Message-----
From: Michael Glavassevich [mailto:mrgla...@ca.ibm.com] 
Sent: Monday, September 15, 2014 9:40 AM
To: j-users@xerces.apache.org
Subject: RE: EXTERNAL: Re: SAX Parser includes ignorable whitespaces in the 
character() method

ignorableWhitespace() was only defined for use with DTDs. Sun's implementation 
may be doing something for XSD but there's nothing in the specification which 
requires that. Xerces is behaving correctly.

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrgla...@ca.ibm.com
E-mail: mrgla...@apache.org

"Zhu, Joe" <joe....@lmco.com> wrote on 09/15/2014 09:41:33 AM:

> Michael,
> Thanks for your reply. The XSD does not allow mixed content. 
> Attached is my test Java code, test xml and test xsd for your reference. 

> 
> Also included below is the run log for Xerces parser and for a Sun
parser. 
> When it runs with the Xerces parser, the whitespaces are reported in 
> the characters() method and nothing is reported in
ignorablewhitespaces(). 
> But when it runs with the Sun parser, the text content is reported in 
> characters() and the whitespaces are reported in
> ignorablewhitesapces() method, as expected.
> 
> Joe
> 
> ------------------------ Log for Xerces parser
---------------------------
> factory = org.apache.xerces.jaxp.SAXParserFactoryImpl@110c424
> parser = org.apache.xerces.jaxp.SAXParserImpl@1bd2664
> startElement howto
> characters = "
>   "
> startElement topic
> characters = "
>       "
> startElement title
> characters = "Java"
> endElement title
> characters = "
>       "
> ...
> 
> ---------------------- Log for Sun parser
---------------------------------
> factory =
com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl@1e8a1f6
> parser = com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl@1e152c5
> startElement howto
> ignorableWhitespace = "
>   "
> startElement topic
> ignorableWhitespace = "
>       "
> startElement title
> characters = "Java"
> endElement title
> ignorableWhitespace = "
>       "
> ...
> 
> 
> -----Original Message-----
> From: Michael Glavassevich [mailto:mrgla...@ca.ibm.com]
> Sent: Friday, September 12, 2014 9:54 AM
> To: j-users@xerces.apache.org
> Subject: EXTERNAL: Re: SAX Parser includes ignorable whitespaces in 
> the character() method
> 
> Your XML document requires a DTD with element declarations which 
> specify that they contain element-only content. Without that a SAX 
> parser cannot determine which whitespaces are 'ignorable'.
> 
> Thanks.
> 
> Michael Glavassevich
> XML Technologies and WAS Development
> IBM Toronto Lab
> E-mail: mrgla...@ca.ibm.com
> E-mail: mrgla...@apache.org
> 
> "Zhu, Joe" <joe....@lmco.com> wrote on 09/11/2014 07:00:11 PM:
> 
> > I am writing an app which need to access all text content in XML. 
> > According to the ContentHandler API, this could be accomplished by 
> > using a validating parser and the characters() method.
> > 
> > But with the Xerces parser, the characters() method could contain 
> > ignorable whitespaces (XML formatting whitespaces). I have no way to 
> > tell if the whitespace is ignorable whitespace or is part of the XML
> content.
> > 
> > Has anybody else run into the problem? I tested with both Xerces 2.
> > 9.1 and Xerces 2.11. They behave the same way.
> > 
> > Joe Zhu
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
> For additional commands, e-mail: j-users-h...@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-users-h...@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-users-h...@xerces.apache.org

Reply via email to