a billion? Wow! First, I really, really, really doubt you can use a RAMdir
to index a billion documents. I'd be interested in the parameters of your
problem if you could. I'd be especially interested in providing a home for
any of your old hardware, since I bet it beats mine all to hell <G>.

Second, you'll just have to play with the MergeFactor and MaxBufferedDocs to
see how high you can set them on your particular machine (using an FSDir).
Then reduce those factors by a pretty major amount (I'd recommend maybe 1/4
the size that runs out memory on your test data set). The last thing you
want to do is cruise through 900,000,000 documents and find the particular
set of data that causes an exception...

Actually, I'd guess you're better off indexing the documents in smaller
groups and combining the indexes afterwards. That way, you place an upper
limit on how much work is lost if you have any problems, as well as
parallelize the index creation process. Assuming you have the hardware.

Best
Erick

Reply via email to