Did you mean Xml*Strip*CharFilter?
koji
--
http://www.rondhuit.com/en/
(11/06/15 22:12), Mike Sokolov (JIRA) wrote:
XmlCharFilter
-------------
Key: SOLR-2597
URL: https://issues.apache.org/jira/browse/SOLR-2597
Project: Solr
Issue Type: Improvement
Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Mike Sokolov
This CharFilter processes incoming XML using the Woodstox parser, stripping all
non-text content and remembering offsets, just like HTMLCharFilter, but
respecting XML conventions like XML entities defined in a DTD. XmlCharFilter
also provides the ability to exclude (and include) the content of certain named
elements.
In order to compute character offsets properly when mixed line termination styles are present (\r, \r\n), or when XML
character entities (<,",&) are present, we require a newer version of Woodstox (4.1.1) than is
currently in solr/lib. The earlier versions of the parser could not report these entity events, so we couldn't tell
the difference between "<" and"<" and the offsets could be wrong. The upgraded version
is in the patch.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]