Re: java on 64 bits

Roxana Angheluta Thu, 27 Oct 2005 02:47:30 -0700

Hello everyone!

Here are the conclusions we got after digging more into the problem,maybe they help someone:

1) Filling of the hard-drive was not due to java 64, this wascoincidentally.2) The intermediate files Yonik talked about (*.f*) were present becausethe indexing process was merging very large segments, which took a whileto be merged.3) We are indexing a continous stream of data. As documents getout-of-date they are deleted from the index. In order to ensure datathroughput we use a batch indexing strategy by setting mergeFactor to50, but never optimizing. The downside of this is that it will take along time before we reach the point where deleted documents are purgedwhen out-of-date segments are merged. This means we end up with largesegments that contain nothing but deleted documents that could bedeleted if they weren't included in the segments file.4) Assuming that frequently merging into a large segment doesn't affectthe data throughput, then we should probably have implemented thestrategy as described by Doug Cutting here - scroll down:http://www.gossamer-threads.com/lists/lucene/java-user/29350?page=last


Hth,
casper & roxana

Thanks everyone for the answers!
I'm experimenting with your suggestions, I will let you know ifsomething interesting pops up.
roxana
1) make sure the failure was due to an OutOfMemory exception and not
something else.
2) if you have enough memory, increase the max JVM heap size (-Xmx)
3) if you don't need more than 1.5G or so of heap, use the 32 bit JVM
instead (depending on architecture, it can acutally be a little faster
because more references fit in the CPU cache).
4) see how many indexed fields you have and if you can consolidateany of
them
4.5) if you don't have too many indexed fields, and have enough sparefile
descriptors, try using the non-compound file format instead.
5) run with the latest version of lucene (1.9 dev version) which mayhave
better memory usage during optimizes & segment merges.
6) If/when optional norms
http://issues.apache.org/jira/browse/LUCENE-448
makes it into lucene, you can apply it to any indexed fields forwhich you
don't need index-time boosting or length normalization.

As for getting rid of your current intermediate files, I'd rebuild from
scratch just to ensure things are OK.

-Yonik
Now hiring -- http://tinyurl.com/7m67g

On 10/21/05, Roxana Angheluta <[EMAIL PROTECTED]> wrote:
Thank you, Yonik, it seems this is the case.
What can we do in this case? Would running the program with java-d32 be
a solution?

Thanks again,
roxana
One possibility: if lucene runs out of memory while adding oroptimizing,
it
can leave unused files beind that increase the size of the index. A 64
bit
JVM will require more memory than a 32 bit one due to the size of all
references being doubled.
If you are using the compound file format (the default - check for.cfs
files), then it's easy to check if you have this problem by seeing if
there
are any *.f* files in the index directory. These are intermediatefiles
and
shouldn't exist for long in a compound-file index.

-Yonik
Now hiring -- http://tinyurl.com/7m67g
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: java on 64 bits

Reply via email to