On Nov 20, 2007, at 6:29 AM, kumarlimbu wrote:
Our document contains a total of 23 fields in one document and we
STORE all
of them in lucene index.
We have recently had some performance issues and our analysis has
shown the
bottleneck to be lucene search and retrieval.
Perhaps you can share your information on java-dev along with any
detailed tests, etc. so that we can see if there is anything we can
improve.
We have been thinking about reducing the number of fields per
document by
removing unnecessary fields and by merging fields with similar
weightings.
Will reducing the number of fields help to optimize performance?
Yes
Another issue is we are currently retrieving around 9 fields after
we do a
search. Some are long text of up to 1000 words. Is it a large
overhead to
retrieve long fields?
Yes. Are you using FieldSelector? Also, I would only STORE those
fields that you actually need to display, not all 23. Do you display
all 9 fields right away or are some only when you choose a document?
If so, try the Lazy Loading piece of FieldSelector.
We are considering the option of separating the search and retrieve
parts so
that Lucene performs the search, MYSQL stores the data. We just
store the
INDEXED field and primary key in the lucene index. After searching
we only
return 1 field (primary key) instead of 9 fields. This field will be
used
for retrieving the actual information from the MySQL database. Will
reducing
the number of fields retrieved from lucene reduce the response time
or will
using MySQL database make it worse?
Hard to say, you will likely have to try it out.
So our main concern is to find if retrieving fields usually takes
longer
than searching or not? What does lucene spend most time doing –
search or
retrieval? We are also concerned that using MySQL will have
performance
issues because we will be doing I/O for MySQL as well as Lucene. We
also add
around 100k documents each day and remove around the same number of
documents. Will this frequent read and write have impact on
performance?
What version of Lucene are you using? Naturally, if you are updating
things that will have an effect, but that may not be a factor. I
would check out http://wiki.apache.org/lucene-java/BasicsOfPerformance
Also, you may want to try out (not in production) the trunk version of
Lucene.
Last, but not least, do you have some sort of cache for your
Documents? Obviously, you need to have appropriate semantics for when
the underlying docs change, but using a cache makes sense, too.
-Grant
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]