Hi, I am developing an application which uses Lucene for indexing and searching 1 bln documents. (the document size is very small though. Each document has a single field of 5-10 words; so I believe that my data size is within the tested limits).
I am using the following configuration: 1. 1.5 gig RAM to the jvm 2. 100GB disk space. 3. Index creation tuning factors: a. mergeFactor = 10 b. maxFieldLength = 10 c. maxMergeDocs = 5000000 (if I try with a larger value, I get an out-of-memory) With these settings, I am able to create an index of 100 million docs (10 pow 8) in 15 mins consuming a disk space of 2.5gb. Which is quite satisfactory for me, but nevertheless, I want to know what else can be done to tune it further. Please help. Also, with these settings, can I expect the time and size to grow linearly for 1bln (10 pow 9) documents? Thanks and Regards, Shelly Singh Center For KNowledge Driven Information Systems, Infosys Email: shelly_si...@infosys.com<mailto:shelly_si...@infosys.com> Phone: (M) 91 992 369 7200, (VoIP)2022978622