We are about to upgrade to Solr/Lucene 3.3 from a 3.1dev version (Lucene Implementation Version: 3.1-SNAPSHOT 1036094 - 2010-11-19 16:01:10)
We have a 6 TB + index that includes somewhere over 200 languages that was indexed with the ICUTokenizer and ICUFoldingFilter from 3.1dev and would like to avoid re-indexing if possible. LUCENE-3149<http://issues.apache.org/jira/browse/LUCENE-3149>: Upgrade contrib/icu's ICU jar file to ICU 4.8. I couldn't tell from looking at the release notes from ICU 4.8 whether the changes affected internal API's or actual rules for tokenizing or folding Do the changes to the ICU filters/tokenizers in Solr/Lucene 3.3 change how tokenizing and the folding filter work in terms of queries run through the 3.3 filters possibly not matching documents indexed with the 3.1dev filters? Tom Burton-West