Scaling Lucene to 1bln docs

Shelly_Singh Mon, 09 Aug 2010 23:55:22 -0700

Hi,

I am developing an application which uses Lucene for indexing and searching 1 
bln documents. (the document size is very small though. Each document has a 
single field of 5-10 words; so I believe that my data size is within the tested 
limits).


I am using the following configuration:
1.      1.5 gig RAM to the jvm
2.      100GB disk space.
3.      Index creation tuning factors:
a.      mergeFactor = 10
b.      maxFieldLength = 10
c.      maxMergeDocs = 5000000 (if I try with a larger value, I get an 
out-of-memory)

With these settings, I am able to create an index of 100 million docs (10 pow 
8)  in 15 mins consuming a disk space of 2.5gb. Which is quite satisfactory for 
me, but nevertheless, I want to know what else can be done to tune it further. 
Please help.
Also, with these settings, can I expect the time and size to grow linearly for 
1bln (10 pow 9) documents?

Thanks and Regards,

Shelly Singh
Center For KNowledge Driven Information Systems, Infosys
Email: shelly_si...@infosys.com<mailto:shelly_si...@infosys.com>
Phone: (M) 91 992 369 7200, (VoIP)2022978622

Scaling Lucene to 1bln docs

Reply via email to