RE: Multi -threaded indexing of large number of PDF documents

Sudarsan, Sithu D. Fri, 24 Oct 2008 07:52:43 -0700

 There have been some earlier messages, where memory consumption issue
for Lucene Documents due to 64 bit (double that of 32 bit). We expect
the index to grow very large, and we may end up maintaining more than
one with different analyzers for the same data set. Hence we are
concerned about the index size as well. If there are ways to overcome
it, we're game for 64 bit version as well :-)


Any ideas,


Thanks and regards,
Sithu Sudarsan
Graduate Research Assistant, UALR
& Visiting Researcher, CDRH/OSEL

[EMAIL PROTECTED]
[EMAIL PROTECTED]

-----Original Message-----
From: Toke Eskildsen [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 24, 2008 10:43 AM
To: java-user@lucene.apache.org
Subject: RE: Multi -threaded indexing of large number of PDF documents

On Fri, 2008-10-24 at 16:01 +0200, Sudarsan, Sithu D. wrote:
> 4. We've tried using larger JVM space by defining -Xms1800m and
> -Xmx1800m, but it runs out of memory. Only -Xms1080m and -Xmx1080m
seems
> stable. That is strange as we have 32 GB of RAM and 34GB swap space.
> Typically no other application is running. However, the CentOS version
> is 32 bit. The Ungava project seems to be using 64 bit.

The <2GB limit for Java is a known problem under Windows. I don't know
about CentOS, but from your description it seems that the problem exists
on that platform too. Anyway, you'll never get above 4GB for Java when
you're running 32bit. Might I ask why you're not using 64bit for a 32GB
machine?


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Multi -threaded indexing of large number of PDF documents

Reply via email to