On Sat, May 26, 2012 at 2:59 AM, Yang <teddyyyy...@gmail.com> wrote: > I tested with more threads / processes. indeed this is completely > cpu-bound, since running 1 thread gives the same latency as 4 threads (my > box has 4 cores) > > > given this, is there any way to simplify the scoring computation (i'm only > using lucene as a first level "rough" search, so the search quality is not > a huge issue here) , so that, for example, fewer fields are evaluated or a > simpler scoring function is used?
are you using disjunction or conjunction queries? Can you make some parts of the query mandatory? simon > > thanks > Yang > > On Fri, May 25, 2012 at 5:47 PM, Yang <teddyyyy...@gmail.com> wrote: > >> thanks a lot guys >> >> >> On Tue, May 22, 2012 at 1:34 AM, Ian Lea <ian....@gmail.com> wrote: >> >>> Lots of good tips in >>> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, linked from >>> the FAQ. >>> >>> >>> -- >>> Ian. >>> >>> >>> On Tue, May 22, 2012 at 2:08 AM, Li Li <fancye...@gmail.com> wrote: >>> > something wrong when writing in my android client. >>> > if RAMDirectory do not help, i think the bottleneck is cpu. you may try >>> to >>> > tune jvm but i do not expect much improvement. >>> > the best one is splitting your index into 2 or more smaller ones. >>> > you can then use solr s distributed searching. >>> > if the cpu is not fully used, yuo can do this in one physical machine >>> > >>> > 在 2012-5-22 上午8:50,"Li Li" <fancye...@gmail.com>写道: >>> >> >>> >> >>> >> 在 2012-5-22 凌晨4:59,"Yang" <teddyyyy...@gmail.com>写道: >>> >> >>> >> > >>> >> > I'm trying to make my search faster. right now a query like >>> >> > >>> >> > name:Joe Moe Pizza address:77 main street city:San Francisco >>> >> >is this a conjunction query or a disjunction query? >>> >> >>> >> > in a index with 20mil such short business descriptions (total size >>> > about 3GB) takes about 100--200ms. >>> >> >20m is not a small size, how many results for a query in average? >>> >> >>> >> > I profiled the query, most time is spent in TermScorer.score(), as is >>> > shown by the attached yourkit screenshot. >>> >> >that's true, for a query, matching and scoring is very time consuming >>> > and cpu intensive. another one is io for reading postings. >>> >> >>> >> > >>> >> > >>> >> > >>> >> > I tried loading the index onto tmpfs (in-memory block device), and >>> also >>> > tried RAMDirectory, neither helps much. >>> >> >if that is true. it seems that io is not the >>> >> > I am reading >>> > http://www.cnlp.org/presentations/slides/AdvancedLuceneEU.pdf >>> >> > it mentions >>> >> > Size >>> >> > – Stopword removal >>> >> > – Stemming >>> >> > • Lucene has a number of stemmers available >>> >> > • Light versus Aggressive >>> >> > • May prevent fine-grained matches in some cases >>> >> > – Not a linear factor (usually) due to index compression >>> >> > >>> >> > so for "stopword removal", I'm already using the standard analyzer, >>> so >>> > stop word removal is already included, right? >>> >> > >>> >> > also generally any other tricks to try for reducing the search >>> latency? >>> >> > >>> >> > Thanks! >>> >> > Yang >>> >> > >>> >> > >>> >> > --------------------------------------------------------------------- >>> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org