Be sure to check and see if your app is compute or I/O bound during this process - whether too little of your index is cached in system memory and each query requires I/O, lots of it.
-- Jack Krupansky On Thu, Jan 21, 2016 at 1:52 PM, Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > In my experience, shingles can hurt query performance because the term > dictionary grows quite a bit. There's far more unique bigrams than there > are words. While the lookup time doesn't grow linearly with the number of > terms, it still grows. > > I haven't specifically compared performance numbers shingles vs phrase, but > your numbers don't strike me as particularly shocking with performance > issues I've had in the past with larger term dictionary sizes. > > Hope that helps > -Doug > > > > > On Thu, Jan 21, 2016 at 1:23 PM, Bertil Chapuis <bchap...@gmail.com> > wrote: > > > Hello, > > > > I'm trying improve the speed of an index when searching for long > phrases. I > > performed some tests with the benchmark module. With a simple analyser > and > > PhraseQueries and get a throughput of 118 rec/sec. My test dataset is the > > latest dump of wikipedia. Here is the filters I use at indexation and > query > > time: > > > > var filter: TokenFilter = new StandardFilter(tokenizer) > > filter = new LowerCaseFilter(filter) > > filter = new EnglishPossessiveFilter(filter) > > filter = new StopFilter(filter, StopAnalyzer.ENGLISH_STOP_WORDS_SET) > > filter = new SnowballFilter(filter, "English") > > > > In order to improve performances I tried to add a ShingleFilter and did > > some benchmark with PhraseQueries and BooleanQueries (Should, Must) and > in > > both cases got a lower throughput (respectively 83rec/sec and 84 > rec/sec). > > Here is the filter: > > > > var filter: TokenFilter = new StandardFilter(tokenizer) > > filter = new LowerCaseFilter(filter) > > filter = new EnglishPossessiveFilter(filter) > > filter = new StopFilter(filter, StopAnalyzer.ENGLISH_STOP_WORDS_SET) > > filter = new SnowballFilter(filter, "English") > > val shingleFilter = new ShingleFilter(filter, 2, 2) > > shingleFilter.setOutputUnigrams(false) > > filter = shingleFilter > > > > From what I read, the performances should be better, but I'm unable to > get > > the desired results. Has anyone some advices on the best way to use > shingle > > in order to improve performances? Should I use some other form of Query? > > > > Thank you in advance for your help. > > > > Regards, > > > > Bertil > > > > > > -- > *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections > <http://opensourceconnections.com>, LLC | 240.476.9983 > Author: Relevant Search <http://manning.com/turnbull> > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless > of whether attachments are marked as such. >