Re: Performance issue

Matthew Hall Mon, 02 Feb 2009 11:35:35 -0800

Do you NEED to be using 7 fields here?

Like Erick said, if you could give us an example of the types of datayou are trying to search against, it would be quite helpful.

Its possible that you might be able to say collapse your 7 fields downto a single field, which would likely reduce the overall number of orclauses in your searches, speeding things up nicely.

At my project we search two letter prefix searches in sub seconds, formuch larger datasets. Alot of this however is directly due to how ourindexes are structured.


-Matt

Erick Erickson wrote:

Prefix queries are expensive here. The problem is
that each one forms a very large OR clause on all
the terms that start with those two letters. For instance,
if a field in your index contained
mine
milanta
mica

a prefix search on "mi" would form
mine OR milanta OR mica.

Doing this across seven fields could get expensive.

Two things:
1> what is the problem you are trying to solve? Perhaps some
of the folks on the list can give you some suggestions. You can
think about many strategies depending upon what you want
to accomplish. A 300M index isn't very big, so you could, for
instance, think about indexing a separate field that contains only
the two beginning letters and search *that* in this case. I'll
assume that three letter prefix queries are OK.

2> How are you measuring query time? If you're measuring the
time it takes when you first start a searcher, be aware that the
first few queries are usually slow because the caches haven't
been filled. Further, are you measuring total response time or
are you measuring *just* the query time? It's possible that the
time is being spent assembling the response in your code
rather than actual searching. You might insert some timers
to determine that.

Best
Erick

On Mon, Feb 2, 2009 at 2:58 AM, Mittal, Sourabh (IDEAS) <
sourabh-931.mit...@morganstanley.com> wrote:

Hi All,

We face serious performance issues when users do 2 letter search e.g ho,
jo, pa ma, um ar, ma fi etc. time taken between 10 - 15 secs.
Below is our implementation details:

1. Search performs on 7 fields.
2. PrefixQuery implementation on all fields
3. AND search.
4. Our indexer size is 300 MB.
5. We show only 100 top documents only on the basis of score.
6. We user StandardAnalyzer & StandardTokenizer for indexing &
searching.
7. Lucene 2.4
8. JDK1 .6

Please suggest me how can we improve the performance.

Regards,
Sourabh Mittal
Morgan Stanley | IDEAS Practice Areas
Manikchand Ikon | South Wing 18 | Dhole Patil Road
Pune, 411001
Phone: +91 20 2620-7053
sourabh-931.mit...@morganstanley.com



--------------------------------------------------------------------------
NOTICE: If received in error, please destroy and notify sender. Sender does
not intend to waive confidentiality or privilege. Use of this email is
prohibited when received in error.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Performance issue

Reply via email to