Among the many changes of Apache Lucene 5, it is no longer possible to override the Analyzer on a per-document base.
You have to pick a single Analyzer when opening the IndexWriter. Of course the Analyzer can still return a different tokenization chain for each field, but the field->tokenizer mapping has to be consistent for the lifecycle of the IndexWriter. This means we might need to drop our "Dynamic Analyzer" feature: http://docs.jboss.org/hibernate/search/5.4/reference/en-US/html_single/#_dynamic_analyzer_selection I did ask to restore the functionality: https://issues.apache.org/jira/browse/LUCENE-6212 So, the alternatives I'm seeing: # Dropping the Dynamic Analyzer feature # Cheat and pass in a mutable Analyzer - needs some caution re concurrent usage # Cheat and pass in a pre-analyzed Document # Fork & patch the IndexWriter Patching the functionality back in Lucene is trivial, but the Lucene team needs to agree on the use case and then the release time will be long. We should discuss both a short-term solution and the better long-term solution. My favourite long-term solution would be to do pre-analysis: in our master/slave clustering approach, that would have several other benefits: - move the analyzer work to the slaves - reduce the network payloads - remove the need to be able to serialize analyzers But I'd prefer to do this in a second "polishing phase" rather than consider such a backend rewrite as a blocker for Lucene 5. WDYT? Thanks, Sanne _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev