Shingles should make a huge different on phrase query performance if 1) the phrase queries involve high frequency terms and 2) you have a substantial number of documents in the index (so that time-to-visit-postings dominates over time-to-lookup-terms).
118 rec/sec is already very fast for a long phrase on a large index ... how many documents in your index. You could also try using CommonGramsFilter instead: it's like shingles, but only for high frequency terms, so you get less increase on your index size but big perf gains for the otherwise slow phrase queries. Mike McCandless http://blog.mikemccandless.com On Thu, Jan 21, 2016 at 1:23 PM, Bertil Chapuis <bchap...@gmail.com> wrote: > Hello, > > I'm trying improve the speed of an index when searching for long phrases. I > performed some tests with the benchmark module. With a simple analyser and > PhraseQueries and get a throughput of 118 rec/sec. My test dataset is the > latest dump of wikipedia. Here is the filters I use at indexation and query > time: > > var filter: TokenFilter = new StandardFilter(tokenizer) > filter = new LowerCaseFilter(filter) > filter = new EnglishPossessiveFilter(filter) > filter = new StopFilter(filter, StopAnalyzer.ENGLISH_STOP_WORDS_SET) > filter = new SnowballFilter(filter, "English") > > In order to improve performances I tried to add a ShingleFilter and did > some benchmark with PhraseQueries and BooleanQueries (Should, Must) and in > both cases got a lower throughput (respectively 83rec/sec and 84 rec/sec). > Here is the filter: > > var filter: TokenFilter = new StandardFilter(tokenizer) > filter = new LowerCaseFilter(filter) > filter = new EnglishPossessiveFilter(filter) > filter = new StopFilter(filter, StopAnalyzer.ENGLISH_STOP_WORDS_SET) > filter = new SnowballFilter(filter, "English") > val shingleFilter = new ShingleFilter(filter, 2, 2) > shingleFilter.setOutputUnigrams(false) > filter = shingleFilter > > From what I read, the performances should be better, but I'm unable to get > the desired results. Has anyone some advices on the best way to use shingle > in order to improve performances? Should I use some other form of Query? > > Thank you in advance for your help. > > Regards, > > Bertil --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org