Hi,
Aprrox 50 Million i have processed upto now. I kept maxMergeFactor and
maxBufferedDoc's value 1000. This value i got after several round of test
runs.
Indexing rate for each document in 50 M, is 1 Document per 4.85 ms.

I am only using fsdirectory. Is there any other way to reduce this time??

with regards,


On 6/13/06, Erick Erickson <[EMAIL PROTECTED]> wrote:

a billion? Wow! First, I really, really, really doubt you can use a RAMdir
to index a billion documents. I'd be interested in the parameters of your
problem if you could. I'd be especially interested in providing a home for
any of your old hardware, since I bet it beats mine all to hell <G>.

Second, you'll just have to play with the MergeFactor and MaxBufferedDocs
to
see how high you can set them on your particular machine (using an FSDir).
Then reduce those factors by a pretty major amount (I'd recommend maybe
1/4
the size that runs out memory on your test data set). The last thing you
want to do is cruise through 900,000,000 documents and find the particular
set of data that causes an exception...

Actually, I'd guess you're better off indexing the documents in smaller
groups and combining the indexes afterwards. That way, you place an upper
limit on how much work is lost if you have any problems, as well as
parallelize the index creation process. Assuming you have the hardware.

Best
Erick


Reply via email to