I want to index 1 billion documents. what do you think which one (i mean
using fsDir or ramDir) is suitable for indexing these many documents.


On 6/12/06, Flik Shen <[EMAIL PROTECTED]> wrote:


It means that to pick both high maxBufferedDocs and mergeFator will
improve your indexing performance.
But if too high, it will lead you to an OutOfMemberException.exception.
And if you set mergeFactor too high will also lead you to problem "open
too many files".
So you should pick proper values according to your box physical setting.

> -----Original Message-----
> From: vipin sharma [mailto:[EMAIL PROTECTED]
> Sent: Monday, June 12, 2006 12:31 PM
> To: java-user@lucene.apache.org; Otis Gospodnetic
> Subject: Re: IndexWriter.addIndexes & optimizatio
>
>  - > Just set your maxBufferedDocs to as high a number as your
RAM/heap will
> let you, and pick a mergeFactor that is high, but doesn't get you in
trouble
> with open files.
>
> can you please explaing this in brief??
>
> regards and thanks,
>
> On 6/9/06, Otis Gospodnetic < [EMAIL PROTECTED]> wrote:
> >
> > When writing a unit test that comapres RAMDirectory and FSDirectory
> > performance for Lucene in Action I had a very hard time showing that
> > RAMDirectory really is faster. :)  Just set your maxBufferedDocs to
as high
> > a number as your RAM/heap will let you, and pick a mergeFactor that
is high,
> > but doesn't get you in trouble with open files.
> >
> > Otis
> >
> > ----- Original Message ----
> > From: Dan Armbrust <[EMAIL PROTECTED]>
> > To: java-user@lucene.apache.org
> > Sent: Wednesday, June 7, 2006 4:05:49 PM
> > Subject: Re: IndexWriter.addIndexes & optimization
> >
> > Benjamin Stein wrote:
> >
> > >
> > > I could probably store the little RAMDirectories to disk as many
> > > FSDirectories, and then addIndexes() of *all* the FSDirectories at
the
> > end
> > > instead of every time.  That would probably be smart.
> > >
> > > Glad I asked myself!
> > >
> >
> > That was what I was going to suggest - you may also want to
benchmark to
> > see if the RAMDirectory is buying you anything.  With the data that
I am
> > indexing on my hardware, I found it to be faster to index to a
regular
> > FSDirectory that it is to use the RAMDirectory.  Especially if you
tweak
> > the performance knobs on the indexer so it does its own caching
before
> > it writes to the Directory.
> >
> > I do batches of documents to FSDirectories - and then merge all of
the
> > FSDirectories into a new master index at the end - so I never have
to
> > optimize during the indexing process.
> >
> > Dan
> >
> >
> > --
> > ****************************
> > Daniel Armbrust
> > Biomedical Informatics
> > Mayo Clinic Rochester
> > daniel.armbrust(at)mayo.edu
> > http://informatics.mayo.edu/
> >
> >
---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >
> >
> >
> >
---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >

**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
solely for the use of the addressee(s). If you are not the intended
recipient, please notify the sender by e-mail and delete the original
message. Further, you are not to copy, disclose, or distribute this e-mail
or its contents to any other person and any such actions are unlawful. This
e-mail may contain viruses. Infosys has taken every reasonable precaution to
minimize this risk, but is not liable for any damage you may sustain as a
result of any virus in this e-mail. You should carry out your own virus
checks before opening the e-mail or attachment. Infosys reserves the right
to monitor and review the content of all messages sent to or from this
e-mail address. Messages sent to or from this e-mail address may be stored
on the Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to