Hello,

I'm trying improve the speed of an index when searching for long phrases. I
performed some tests with the benchmark module. With a simple analyser and
PhraseQueries and get a throughput of 118 rec/sec. My test dataset is the
latest dump of wikipedia. Here is the filters I use at indexation and query
time:

var filter: TokenFilter = new StandardFilter(tokenizer)
filter = new LowerCaseFilter(filter)
filter = new EnglishPossessiveFilter(filter)
filter = new StopFilter(filter, StopAnalyzer.ENGLISH_STOP_WORDS_SET)
filter = new SnowballFilter(filter, "English")

In order to improve performances I tried to add a ShingleFilter and did
some benchmark with PhraseQueries and BooleanQueries (Should, Must) and in
both cases got a lower throughput (respectively 83rec/sec and 84 rec/sec).
Here is the filter:

var filter: TokenFilter = new StandardFilter(tokenizer)
filter = new LowerCaseFilter(filter)
filter = new EnglishPossessiveFilter(filter)
filter = new StopFilter(filter, StopAnalyzer.ENGLISH_STOP_WORDS_SET)
filter = new SnowballFilter(filter, "English")
val shingleFilter =  new ShingleFilter(filter, 2, 2)
shingleFilter.setOutputUnigrams(false)
filter = shingleFilter

>From what I read, the performances should be better, but I'm unable to get
the desired results. Has anyone some advices on the best way to use shingle
in order to improve performances? Should I use some other form of Query?

Thank you in advance for your help.

Regards,

Bertil

Reply via email to