On one of my other open-source projects (SolrTextTagger) I have a test that deliberately tests the effect of a very long token with the StandardTokenizer, and that project is in turn tested against a wide matrix of Lucene/Solr versions. Before Lucene 4.9, if you had a token that exceeded maxTokenLength (by default the max is 255), this created a skipped position — basically a pseudo-stop-word. Since 4.9, this doesn’t happen anymore; the JFlex scanner thing never reports a token > 255. I checked our code coverage and sure enough the “skippedPositions++” never happens:
https://builds.apache.org/job/Lucene-Solr-Clover-trunk/lastSuccessfulBuild/clover-report/org/apache/lucene/analysis/standard/StandardTokenizer.html?line=167#src-167 Any thoughts on this? Steve? ~ David Smiley Freelance Apache Lucene/Solr Search Consultant/Developer http://www.linkedin.com/in/davidwsmiley
