date:20160111

identifier n-gram tokenizer

2016-01-11 Thread Michal Hlavac

Hello, I published some token filters that can be used to tokenize some kind of identifiers into punctation delimited n-grams (e.g. ip address). I think it needs some optimization, but it works for now. https://github.com/hlavki/lucene-analyzers You can find example of usage in unit test: https

How to make word-N-gram based query and interpolate each N-gram score to obtain final Lucene score

2016-01-11 Thread Rajen Chatterjee

Hello Everyone, I am looking for some method which can help me to build *word-N-gram* based queries. After doing some search I think that I have to define an analyzer as follows: public static Analyzer wordNgramAnalyzer(final int minShingle, final int maxShingle) { return new Analyzer() {