OK, thanks. I actually have both systems implemented. The multi-index one is being used currently and it works well. I have deployed the single index solution a few times during off-peak hours and the response time has been almost the same as the multi-index solution. I tried to simulate some load but again my numbers were mostly similar for both cases.
I have already done all the suggested optimizations since I first ran into problems a few months ago. The performance had improved considerably. Since then, my traffic has increased and I have again started facing some issues during peak-load hours. I guess I should get another box and run proper tests there. Will run a profiler also. Thanks for all the suggestions. Regards, Nikhil ----- Original Message ---- From: Grant Ingersoll <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, 20 September, 2007 9:25:01 PM Subject: Re: Multiple Indices vs Single Index OK, I thought you meant your index would have in it the name of the second index and would thus do a two-stage retrieval. At any rate, if you are saying your combined index with all the stored fields is ~3.4 GB I would think it would fit reasonably on the machine you have and perform reasonably. Naturally, this depends on your application, your users, etc. and I can't make any guarantees, but I certainly recall others managing this size just fine. See the many tips on improving searching and indexing on the Wiki (link at bottom in my signature) and do some profiling/testing. When you said your tests were inconclusive, what tests have you done? If you can, run the tests in a profiler to see where your bottlenecks are. -Grant On Sep 20, 2007, at 11:16 AM, Nikhil Chhaochharia wrote: > I am sorry, it seems that I was not clear with what my problem is. > I will try to describe it again. > > My data is divided into 40 categories and at one time only one > category can be searched. The GUI for the system will ask the user > to select the category from a drop-down. Currently, I have a > separate index for every category. The index sizes varies - one > category index is 10MB and another is 700MB. Other index-sizes are > somewhere in between. > > I was wondering if it will be better to just have 1 large index > with all the 40 indices combined. I do not need to do dual-queries > and my total index size (if I create a single index) is about > 3.4GB. It will increase to maximum of 5-6 GB. I am running this > on a dedicated machine with 8GB RAM. > > Unfortunately I do not have enough hardware to run both in parallel > and test properly. Have just one server which is being used by > live users. So it would be great if you could tell me whether I > should stick with my 40 indices or combine them into 1 index. What > are the pros and cons of each approach ? > > Thanks, > Nikhil > > > ----- Original Message ---- > From: Grant Ingersoll <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Thursday, 20 September, 2007 7:57:21 PM > Subject: Re: Multiple Indices vs Single Index > > If I understand correctly, you want to do a two stage retrieval > right? That is, look up in the initial index (3.4 GB) and then do a > second search on the sub index? Presumably, you have to manage the > Searchers, etc. for each of the sub-indexes as well as the big > index. This means you have to go through the hits from the first > search, then route, etc. correct? > > Have you tried creating one single index with all the (stored) > fields, etc? Worst case scenario, assuming 1GB per index, is you > would have a 40GB index, but my guess is index compression will > reduce it more. Since you are less than that anyway, have you tried > just the straightforward solution? Or do you have other requirements > that force the sub-index solution? Also, I am not sure it will work, > but it seems worth a try. Of course, this also depends on how much > you expect your indexes to grow. > > Also, what was inconclusive about your tests? Maybe you can describe > more what you have tried to date? > > Cheers, > Grant > > On Sep 20, 2007, at 3:50 AM, Nikhil Chhaochharia wrote: > >> Hi, >> >> I have about 40 indices which range in size from 10MB to 700MB. >> There are quite a few stored fields. To get an idea of the >> document size, I have about 400k documents in the 700MB index. >> >> Depending on the query, I choose the index which needs to be >> searched. Each query hits only one index. I was wondering if >> creating a single index where every document will have the >> indexname as a field will be more efficient. I created such an >> index and it was 3.4 GB in size. My initial performance tests with >> it are not conclusive. >> >> Also, what are the other points to be addressed while deciding >> between 1 index and 40 indices. >> >> I have 8GB RAM on the machine. >> >> >> Thanks, >> Nikhil >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > -------------------------- > Grant Ingersoll > http://lucene.grantingersoll.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -------------------------- Grant Ingersoll http://lucene.grantingersoll.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]