BTW, upcoming changes in Lucene for flexible indexing should improve
the RAM usage of the terms index substantially:
https://issues.apache.org/jira/browse/LUCENE-1458
In the current (first) iteration on that patch, TermInfo is no longer
used at all when loading the index. I think for a typical index this
will likely cut in half the RAM used by the terms index.
But... this won't be available for some time (it's still a work in
progress).
Mike
Chris Lu wrote:
So looks like you are not really doing much sorting? This index
divisor
affects reader.terms(), but not too much with sorting.
--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per
request) got
2.6 Million Euro funding!
On Mon, Nov 17, 2008 at 6:21 PM, Zhibin Mai <[EMAIL PROTECTED]> wrote:
It is a cache tunning setting in IndexReader. It can be set via
method
setTermInfosIndexDivisor(int).
Thanks,
Zhibin
________________________________
From: Chris Lu <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Monday, November 17, 2008 7:07:21 PM
Subject: Re: how to estimate how much memory is required to support
the
large index search
Calculation looks right. But what's the "Index divisor" that you
mentioned?
--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per
request) got
2.6 Million Euro funding!
On Mon, Nov 17, 2008 at 5:00 PM, Zhibin Mai <[EMAIL PROTECTED]> wrote:
Aleksander,
I figured it out that most of heap was consumed by the Term cache.
In our
case, the index has 233 millions of terms and 6.4 millions of them
were
loaded into the cache when we did the search. I roughly did a
calculation
that each term will need how much memory, it is about
16 bytes for Term Object + 32 bytes for TermInfo Object + 24 bytes
for
String Object for term text + 2 * length(Char[]) for term text.
In our case, the average length of term text is 25 characters,
that means
each term needs at least 122 bytes. The cache for 6.4 millions of
terms
needs 6.4 * 122 = 780MB. Plus 200MB for caching norm, the RAM for
cache
is
larger than 980MB. We work around the cache issue for Terms by
setting
index
divisor of the IndexReader to a higher value. Actually, the
performance
of
search is good even using index divisor as 4.
Thanks,
Zhibin
________________________________
From: Aleksander M. Stensby <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Monday, November 17, 2008 2:31:04 AM
Subject: Re: how to estimate how much memory is required to
support the
large index search
One major factor that may result in heap space problems is if you
are
doing
any form of sorting when searching. Do you have any form of
default sort
in
your application? Also, the type of field used for sorting is
important
with
regard to memory consumption.
This issue has been discussed before on the list. (You can search
the
archive for sorting and memory consumption.)
- Aleksander
On Sun, 16 Nov 2008 14:36:36 +0100, Zhibin Mai <[EMAIL PROTECTED]>
wrote:
Hello,
I
am a beginner on using lucene. We developed an application to
create and search index using lucene 2.3.1. We would like to know
how
to estimate how much memory is required to support
the index search given an index.
Recently,
the size of the index has reached to about 200GB with 197M of
documents
and 223M of terms. Our application starts having intermittent
"OutOfMemoryError: Java heap space" when we use
it to search the index. We use JProfiler to get the following
memory
allocation when we do one keyword search:
char[] 332MB
org.apache.lucene.index.TermInfo 194MB
java.lang.String 146MB
org.apache.lucene.index.Term 99,823KB
org.apache.lucene.index.Term 24,956KB
org.apache.lucene.index.TermInfo[] 24,956KB
byte[] 188MB
long[] 49,912KB
The memory allocation for the first 6 types of objects does not
change
when we change the search criteria. Could you please give me some
advice
what major factors will affect the memory allocation
and how those factors will affect the memory usage precisely on
search?
Is it possible to reduce the memory usage on search?
Thank you,
Zhibin
--Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]