If the current version is working well, what is the reason to move?
Is it just to make management of the indices easier?
On Sep 20, 2007, at 12:07 PM, Nikhil Chhaochharia wrote:
OK, thanks.
I actually have both systems implemented. The multi-index one is
being used currently and it works well. I have deployed the single
index solution a few times during off-peak hours and the response
time has been almost the same as the multi-index solution. I tried
to simulate some load but again my numbers were mostly similar for
both cases.
I have already done all the suggested optimizations since I first
ran into problems a few months ago. The performance had improved
considerably. Since then, my traffic has increased and I have
again started facing some issues during peak-load hours.
I guess I should get another box and run proper tests there. Will
run a profiler also.
Thanks for all the suggestions.
Regards,
Nikhil
----- Original Message ----
From: Grant Ingersoll <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, 20 September, 2007 9:25:01 PM
Subject: Re: Multiple Indices vs Single Index
OK, I thought you meant your index would have in it the name of the
second index and would thus do a two-stage retrieval.
At any rate, if you are saying your combined index with all the
stored fields is ~3.4 GB I would think it would fit reasonably on the
machine you have and perform reasonably. Naturally, this depends on
your application, your users, etc. and I can't make any guarantees,
but I certainly recall others managing this size just fine. See the
many tips on improving searching and indexing on the Wiki (link at
bottom in my signature) and do some profiling/testing.
When you said your tests were inconclusive, what tests have you
done? If you can, run the tests in a profiler to see where your
bottlenecks are.
-Grant
On Sep 20, 2007, at 11:16 AM, Nikhil Chhaochharia wrote:
I am sorry, it seems that I was not clear with what my problem is.
I will try to describe it again.
My data is divided into 40 categories and at one time only one
category can be searched. The GUI for the system will ask the user
to select the category from a drop-down. Currently, I have a
separate index for every category. The index sizes varies - one
category index is 10MB and another is 700MB. Other index-sizes are
somewhere in between.
I was wondering if it will be better to just have 1 large index
with all the 40 indices combined. I do not need to do dual-queries
and my total index size (if I create a single index) is about
3.4GB. It will increase to maximum of 5-6 GB. I am running this
on a dedicated machine with 8GB RAM.
Unfortunately I do not have enough hardware to run both in parallel
and test properly. Have just one server which is being used by
live users. So it would be great if you could tell me whether I
should stick with my 40 indices or combine them into 1 index. What
are the pros and cons of each approach ?
Thanks,
Nikhil
----- Original Message ----
From: Grant Ingersoll <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, 20 September, 2007 7:57:21 PM
Subject: Re: Multiple Indices vs Single Index
If I understand correctly, you want to do a two stage retrieval
right? That is, look up in the initial index (3.4 GB) and then do a
second search on the sub index? Presumably, you have to manage the
Searchers, etc. for each of the sub-indexes as well as the big
index. This means you have to go through the hits from the first
search, then route, etc. correct?
Have you tried creating one single index with all the (stored)
fields, etc? Worst case scenario, assuming 1GB per index, is you
would have a 40GB index, but my guess is index compression will
reduce it more. Since you are less than that anyway, have you tried
just the straightforward solution? Or do you have other requirements
that force the sub-index solution? Also, I am not sure it will work,
but it seems worth a try. Of course, this also depends on how much
you expect your indexes to grow.
Also, what was inconclusive about your tests? Maybe you can describe
more what you have tried to date?
Cheers,
Grant
On Sep 20, 2007, at 3:50 AM, Nikhil Chhaochharia wrote:
Hi,
I have about 40 indices which range in size from 10MB to 700MB.
There are quite a few stored fields. To get an idea of the
document size, I have about 400k documents in the 700MB index.
Depending on the query, I choose the index which needs to be
searched. Each query hits only one index. I was wondering if
creating a single index where every document will have the
indexname as a field will be more efficient. I created such an
index and it was 3.4 GB in size. My initial performance tests with
it are not conclusive.
Also, what are the other points to be addressed while deciding
between 1 index and 40 indices.
I have 8GB RAM on the machine.
Thanks,
Nikhil
--------------------------------------------------------------------
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]