The specification you are quoting from is only concerned with DTDs. See the definition of validating XML processors here [1].
If XSD wanted something similar it would need to set the [element content whitespace] boolean property [2] to true on character information items, but there's nothing in the XSD specification which suggests that XML schema processors are supposed to mutate the XML Infoset in this way. Thanks. [1] http://www.w3.org/TR/REC-xml/#proc-types [2] http://www.w3.org/TR/xml-infoset/#infoitem.character Michael Glavassevich XML Technologies and WAS Development IBM Toronto Lab E-mail: mrgla...@ca.ibm.com E-mail: mrgla...@apache.org "Zhu, Joe" <joe....@lmco.com> wrote on 09/15/2014 11:37:13 AM: > Hmm. Why does it distinguish between DTD and XSD? They are all > schema definitions! > > It is useful to be able to distinguish between ignorable whitespaces > and allowable whitespaces. Xerces can't, which makes it less useful > . It also violates the W3C XML Recommendation, as shown below. > > Everywhere I read, it implies that ignorable whitespaces shall be > reported in ignorableWhitespace(): > > ------------------------ org.sax.xml.ContentHandler API (http:// > docs.oracle.com/javase/6/docs/api/index.html?javax/xml/stream/ > package-summary.html) ---------------- > > ignorableWhitespace > --- > Validating Parsers must use this method to report each chunk of > whitespace in element content (see the W3C XML 1.0 recommendation, > section 2.10): non-validating parsers may also use this method if > they are capable of parsing and using content models. > > --------------------- W3C XML Recommendation (http://www.w3.org/TR/ > REC-xml/#sec-white-space) > ----------------------------------------------------------------------- > 2.10 White Space Handling > ... > An XML processor MUST always pass all characters in a document that > are not markup through to the application. A validating XML > processor MUST also inform the application which of these characters > constitute white space appearing in element content > --------------------------------------------------------------------------------------------------------------------------- > > Joe > > -----Original Message----- > From: Michael Glavassevich [mailto:mrgla...@ca.ibm.com] > Sent: Monday, September 15, 2014 9:40 AM > To: j-users@xerces.apache.org > Subject: RE: EXTERNAL: Re: SAX Parser includes ignorable whitespaces > in the character() method > > ignorableWhitespace() was only defined for use with DTDs. Sun's > implementation may be doing something for XSD but there's nothing in > the specification which requires that. Xerces is behaving correctly. > > Michael Glavassevich > XML Technologies and WAS Development > IBM Toronto Lab > E-mail: mrgla...@ca.ibm.com > E-mail: mrgla...@apache.org > > "Zhu, Joe" <joe....@lmco.com> wrote on 09/15/2014 09:41:33 AM: > > > Michael, > > Thanks for your reply. The XSD does not allow mixed content. > > Attached is my test Java code, test xml and test xsd for your reference. > > > > > Also included below is the run log for Xerces parser and for a Sun > parser. > > When it runs with the Xerces parser, the whitespaces are reported in > > the characters() method and nothing is reported in > ignorablewhitespaces(). > > But when it runs with the Sun parser, the text content is reported in > > characters() and the whitespaces are reported in > > ignorablewhitesapces() method, as expected. > > > > Joe > > > > ------------------------ Log for Xerces parser > --------------------------- > > factory = org.apache.xerces.jaxp.SAXParserFactoryImpl@110c424 > > parser = org.apache.xerces.jaxp.SAXParserImpl@1bd2664 > > startElement howto > > characters = " > > " > > startElement topic > > characters = " > > " > > startElement title > > characters = "Java" > > endElement title > > characters = " > > " > > ... > > > > ---------------------- Log for Sun parser > --------------------------------- > > factory = > com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl@1e8a1f6 > > parser = com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl@1e152c5 > > startElement howto > > ignorableWhitespace = " > > " > > startElement topic > > ignorableWhitespace = " > > " > > startElement title > > characters = "Java" > > endElement title > > ignorableWhitespace = " > > " > > ... > > > > > > -----Original Message----- > > From: Michael Glavassevich [mailto:mrgla...@ca.ibm.com] > > Sent: Friday, September 12, 2014 9:54 AM > > To: j-users@xerces.apache.org > > Subject: EXTERNAL: Re: SAX Parser includes ignorable whitespaces in > > the character() method > > > > Your XML document requires a DTD with element declarations which > > specify that they contain element-only content. Without that a SAX > > parser cannot determine which whitespaces are 'ignorable'. > > > > Thanks. > > > > Michael Glavassevich > > XML Technologies and WAS Development > > IBM Toronto Lab > > E-mail: mrgla...@ca.ibm.com > > E-mail: mrgla...@apache.org > > > > "Zhu, Joe" <joe....@lmco.com> wrote on 09/11/2014 07:00:11 PM: > > > > > I am writing an app which need to access all text content in XML. > > > According to the ContentHandler API, this could be accomplished by > > > using a validating parser and the characters() method. > > > > > > But with the Xerces parser, the characters() method could contain > > > ignorable whitespaces (XML formatting whitespaces). I have no way to > > > tell if the whitespace is ignorable whitespace or is part of the XML > > content. > > > > > > Has anybody else run into the problem? I tested with both Xerces 2. > > > 9.1 and Xerces 2.11. They behave the same way. > > > > > > Joe Zhu > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org > > For additional commands, e-mail: j-users-h...@xerces.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org > For additional commands, e-mail: j-users-h...@xerces.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org > For additional commands, e-mail: j-users-h...@xerces.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org For additional commands, e-mail: j-users-h...@xerces.apache.org