Hello, I'm trying improve the speed of an index when searching for long phrases. I performed some tests with the benchmark module. With a simple analyser and PhraseQueries and get a throughput of 118 rec/sec. My test dataset is the latest dump of wikipedia. Here is the filters I use at indexation and query time:
var filter: TokenFilter = new StandardFilter(tokenizer) filter = new LowerCaseFilter(filter) filter = new EnglishPossessiveFilter(filter) filter = new StopFilter(filter, StopAnalyzer.ENGLISH_STOP_WORDS_SET) filter = new SnowballFilter(filter, "English") In order to improve performances I tried to add a ShingleFilter and did some benchmark with PhraseQueries and BooleanQueries (Should, Must) and in both cases got a lower throughput (respectively 83rec/sec and 84 rec/sec). Here is the filter: var filter: TokenFilter = new StandardFilter(tokenizer) filter = new LowerCaseFilter(filter) filter = new EnglishPossessiveFilter(filter) filter = new StopFilter(filter, StopAnalyzer.ENGLISH_STOP_WORDS_SET) filter = new SnowballFilter(filter, "English") val shingleFilter = new ShingleFilter(filter, 2, 2) shingleFilter.setOutputUnigrams(false) filter = shingleFilter >From what I read, the performances should be better, but I'm unable to get the desired results. Has anyone some advices on the best way to use shingle in order to improve performances? Should I use some other form of Query? Thank you in advance for your help. Regards, Bertil