My understanding of the IndexWriter code is that it more or less manages
this for you. It has an internal RAMDirectory which it uses to index in
memory and then periodically flushes to disk based on your merge factor
settings (amongst other settings). So I am not sure if the extra work
you are doing is really saving you anything. Once all this is done,
then you can call optimize just once at the end (or wherever you feel
you need to)
Benjamin Stein wrote:
On 6/7/06, Benjamin Stein <[EMAIL PROTECTED]> wrote:
During indexing, I have been using a RAMDirectory to store many
thousands
of documents in memory before flushing the buffer to disk using
IndexWriter.addIndexes.
For the most part this works very well, except that performance degrades
tremendously over time due to the implicit call (or two!)
to optimize() inside the addIndexes function.
I could probably store the little RAMDirectories to disk as many
FSDirectories, and then addIndexes() of *all* the FSDirectories at the
end
instead of every time. That would probably be smart.
Glad I asked myself!
--
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]