My understanding of the IndexWriter code is that it more or less manages this for you. It has an internal RAMDirectory which it uses to index in memory and then periodically flushes to disk based on your merge factor settings (amongst other settings). So I am not sure if the extra work you are doing is really saving you anything. Once all this is done, then you can call optimize just once at the end (or wherever you feel you need to)

Benjamin Stein wrote:
On 6/7/06, Benjamin Stein <[EMAIL PROTECTED]> wrote:

During indexing, I have been using a RAMDirectory to store many thousands
of documents in memory before flushing the buffer to disk using
IndexWriter.addIndexes.
For the most part this works very well, except that performance degrades
tremendously over time due to the implicit call (or two!)
to optimize() inside the addIndexes function.



I could probably store the little RAMDirectories to disk as many
FSDirectories, and then addIndexes() of *all* the FSDirectories at the end
instead of every time.  That would probably be smart.

Glad I asked myself!


--

Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 335 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to