Hi David, Thanks for your quick reply.
In fact, we do use WDF in 4.10.2. It very much looks as you explain, that the offsets are preserved in the monotonically increasing order. Here is the list of filters we use on the indexing side: solr.MappingCharFilterFactory solr.StandardTokenizerFactory solr.StandardFilterFactory solr.WordDelimiterFilterFactory solr.LowerCaseFilterFactory custom filters that do not mingle with the order of the offsets. On 4 June 2015 at 18:35, [email protected] <[email protected]> wrote: > Hi Dmitry, > > Ideally, the token stream produces tokens that have a startOffset >= the > startOffset of the previous token from the stream. Sometime in the past > year or so, this was enforced at the indexing layer, I think. There used > to be TokenFilters that violated this contract; I think earlier versions of > WordDelimiterFilter could. If my assumption that this is asserted at the > indexing layer is correct, then I think TokenOrderingFilter is obsolete. > > ~ David > > On Thu, Jun 4, 2015 at 7:48 AM Dmitry Kan <[email protected]> wrote: > >> Hi guys, >> >> Sorry for sending questions to the dev list and not to the user one. >> Somehow I'm getting more luck here. >> >> We have found the class o.a.solr.highlight.TokenOrderingFilter >> with the following comment: >> >> >> -/** >> >> - * Orders Tokens in a window first by their startOffset ascending. >> >> - * endOffset is currently ignored. >> >> - * This is meant to work around fickleness in the highlighter only. It >> >> - * can mess up token positions and should not be used for indexing or >> querying. >> >> - */ >> >> -final class TokenOrderingFilter extends TokenFilter { >> >> In fact, removing this class didn't change the behaviour of the highlighter. >> >> Could anybody shed light on its necessity? >> >> Thanks, >> >> Dmitry Kan >> >>
