So looks like you are not really doing much sorting? This index divisor affects reader.terms(), but not too much with sorting.
-- Chris Lu ------------------------- Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Mon, Nov 17, 2008 at 6:21 PM, Zhibin Mai <[EMAIL PROTECTED]> wrote: > It is a cache tunning setting in IndexReader. It can be set via method > setTermInfosIndexDivisor(int). > > Thanks, > > Zhibin > > > > > ________________________________ > From: Chris Lu <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Monday, November 17, 2008 7:07:21 PM > Subject: Re: how to estimate how much memory is required to support the > large index search > > Calculation looks right. But what's the "Index divisor" that you mentioned? > > -- > Chris Lu > ------------------------- > Instant Scalable Full-Text Search On Any Database/Application > site: http://www.dbsight.net > demo: http://search.dbsight.com > Lucene Database Search in 3 minutes: > > http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes > DBSight customer, a shopping comparison site, (anonymous per request) got > 2.6 Million Euro funding! > > On Mon, Nov 17, 2008 at 5:00 PM, Zhibin Mai <[EMAIL PROTECTED]> wrote: > > > Aleksander, > > > > I figured it out that most of heap was consumed by the Term cache. In our > > case, the index has 233 millions of terms and 6.4 millions of them were > > loaded into the cache when we did the search. I roughly did a calculation > > that each term will need how much memory, it is about > > 16 bytes for Term Object + 32 bytes for TermInfo Object + 24 bytes for > > String Object for term text + 2 * length(Char[]) for term text. > > > > In our case, the average length of term text is 25 characters, that means > > each term needs at least 122 bytes. The cache for 6.4 millions of terms > > needs 6.4 * 122 = 780MB. Plus 200MB for caching norm, the RAM for cache > is > > larger than 980MB. We work around the cache issue for Terms by setting > index > > divisor of the IndexReader to a higher value. Actually, the performance > of > > search is good even using index divisor as 4. > > > > Thanks, > > > > Zhibin > > > > > > > > > > ________________________________ > > From: Aleksander M. Stensby <[EMAIL PROTECTED]> > > To: java-user@lucene.apache.org > > Sent: Monday, November 17, 2008 2:31:04 AM > > Subject: Re: how to estimate how much memory is required to support the > > large index search > > > > One major factor that may result in heap space problems is if you are > doing > > any form of sorting when searching. Do you have any form of default sort > in > > your application? Also, the type of field used for sorting is important > with > > regard to memory consumption. > > > > This issue has been discussed before on the list. (You can search the > > archive for sorting and memory consumption.) > > > > - Aleksander > > > > On Sun, 16 Nov 2008 14:36:36 +0100, Zhibin Mai <[EMAIL PROTECTED]> wrote: > > > > > Hello, > > > > > > I > > > am a beginner on using lucene. We developed an application to > > > create and search index using lucene 2.3.1. We would like to know how > > > to estimate how much memory is required to support > > > the index search given an index. > > > > > > Recently, > > > the size of the index has reached to about 200GB with 197M of documents > > > and 223M of terms. Our application starts having intermittent > > > "OutOfMemoryError: Java heap space" when we use > > > it to search the index. We use JProfiler to get the following memory > > allocation when we do one keyword search: > > > > > > char[] 332MB > > > org.apache.lucene.index.TermInfo 194MB > > > java.lang.String 146MB > > > org.apache.lucene.index.Term 99,823KB > > > org.apache.lucene.index.Term 24,956KB > > > org.apache.lucene.index.TermInfo[] 24,956KB > > > > > > byte[] 188MB > > > long[] 49,912KB > > > > > > The memory allocation for the first 6 types of objects does not change > > when we change the search criteria. Could you please give me some advice > > what major factors will affect the memory allocation > > > and how those factors will affect the memory usage precisely on search? > > Is it possible to reduce the memory usage on search? > > > > > > > > > Thank you, > > > > > > > > > Zhibin > > > > > > > > > > > > > > > > > --Aleksander M. Stensby > > Senior software developer > > Integrasco A/S > > www.integrasco.no > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > >