The specification you are quoting from is only concerned with DTDs. See 
the definition of validating XML processors here [1].

If XSD wanted something similar it would need to set the [element content 
whitespace] boolean property [2] to true on character information items, 
but there's nothing in the XSD specification which suggests that XML 
schema processors are supposed to mutate the XML Infoset in this way.

Thanks.

[1] http://www.w3.org/TR/REC-xml/#proc-types
[2] http://www.w3.org/TR/xml-infoset/#infoitem.character

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrgla...@ca.ibm.com
E-mail: mrgla...@apache.org

"Zhu, Joe" <joe....@lmco.com> wrote on 09/15/2014 11:37:13 AM:

> Hmm. Why does it distinguish between DTD and XSD? They are all 
> schema definitions!
> 
> It is useful to be able to distinguish between ignorable whitespaces
> and allowable whitespaces. Xerces can't, which makes it less useful 
> . It also violates the W3C XML Recommendation, as shown below.
> 
> Everywhere I read, it implies that ignorable whitespaces shall be 
> reported in ignorableWhitespace():
> 
> ------------------------ org.sax.xml.ContentHandler API (http://
> docs.oracle.com/javase/6/docs/api/index.html?javax/xml/stream/
> package-summary.html) ----------------
> 
> ignorableWhitespace
> ---
> Validating Parsers must use this method to report each chunk of 
> whitespace in element content (see the W3C XML 1.0 recommendation, 
> section 2.10): non-validating parsers may also use this method if 
> they are capable of parsing and using content models.
> 
> --------------------- W3C XML Recommendation  (http://www.w3.org/TR/
> REC-xml/#sec-white-space) 
> -----------------------------------------------------------------------
> 2.10 White Space Handling
> ...
> An XML processor MUST always pass all characters in a document that 
> are not markup through to the application. A validating XML 
> processor MUST also inform the application which of these characters
> constitute white space appearing in element content
> 
---------------------------------------------------------------------------------------------------------------------------
> 
> Joe 
> 
> -----Original Message-----
> From: Michael Glavassevich [mailto:mrgla...@ca.ibm.com] 
> Sent: Monday, September 15, 2014 9:40 AM
> To: j-users@xerces.apache.org
> Subject: RE: EXTERNAL: Re: SAX Parser includes ignorable whitespaces
> in the character() method
> 
> ignorableWhitespace() was only defined for use with DTDs. Sun's 
> implementation may be doing something for XSD but there's nothing in
> the specification which requires that. Xerces is behaving correctly.
> 
> Michael Glavassevich
> XML Technologies and WAS Development
> IBM Toronto Lab
> E-mail: mrgla...@ca.ibm.com
> E-mail: mrgla...@apache.org
> 
> "Zhu, Joe" <joe....@lmco.com> wrote on 09/15/2014 09:41:33 AM:
> 
> > Michael,
> > Thanks for your reply. The XSD does not allow mixed content. 
> > Attached is my test Java code, test xml and test xsd for your 
reference. 
> 
> > 
> > Also included below is the run log for Xerces parser and for a Sun
> parser. 
> > When it runs with the Xerces parser, the whitespaces are reported in 
> > the characters() method and nothing is reported in
> ignorablewhitespaces(). 
> > But when it runs with the Sun parser, the text content is reported in 
> > characters() and the whitespaces are reported in
> > ignorablewhitesapces() method, as expected.
> > 
> > Joe
> > 
> > ------------------------ Log for Xerces parser
> ---------------------------
> > factory = org.apache.xerces.jaxp.SAXParserFactoryImpl@110c424
> > parser = org.apache.xerces.jaxp.SAXParserImpl@1bd2664
> > startElement howto
> > characters = "
> >   "
> > startElement topic
> > characters = "
> >       "
> > startElement title
> > characters = "Java"
> > endElement title
> > characters = "
> >       "
> > ...
> > 
> > ---------------------- Log for Sun parser
> ---------------------------------
> > factory =
> com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl@1e8a1f6
> > parser = com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl@1e152c5
> > startElement howto
> > ignorableWhitespace = "
> >   "
> > startElement topic
> > ignorableWhitespace = "
> >       "
> > startElement title
> > characters = "Java"
> > endElement title
> > ignorableWhitespace = "
> >       "
> > ...
> > 
> > 
> > -----Original Message-----
> > From: Michael Glavassevich [mailto:mrgla...@ca.ibm.com]
> > Sent: Friday, September 12, 2014 9:54 AM
> > To: j-users@xerces.apache.org
> > Subject: EXTERNAL: Re: SAX Parser includes ignorable whitespaces in 
> > the character() method
> > 
> > Your XML document requires a DTD with element declarations which 
> > specify that they contain element-only content. Without that a SAX 
> > parser cannot determine which whitespaces are 'ignorable'.
> > 
> > Thanks.
> > 
> > Michael Glavassevich
> > XML Technologies and WAS Development
> > IBM Toronto Lab
> > E-mail: mrgla...@ca.ibm.com
> > E-mail: mrgla...@apache.org
> > 
> > "Zhu, Joe" <joe....@lmco.com> wrote on 09/11/2014 07:00:11 PM:
> > 
> > > I am writing an app which need to access all text content in XML. 
> > > According to the ContentHandler API, this could be accomplished by 
> > > using a validating parser and the characters() method.
> > > 
> > > But with the Xerces parser, the characters() method could contain 
> > > ignorable whitespaces (XML formatting whitespaces). I have no way to 

> > > tell if the whitespace is ignorable whitespace or is part of the XML
> > content.
> > > 
> > > Has anybody else run into the problem? I tested with both Xerces 2.
> > > 9.1 and Xerces 2.11. They behave the same way.
> > > 
> > > Joe Zhu
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
> > For additional commands, e-mail: j-users-h...@xerces.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
> For additional commands, e-mail: j-users-h...@xerces.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
> For additional commands, e-mail: j-users-h...@xerces.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-users-h...@xerces.apache.org

Reply via email to