> This is about multiple sessions with the writer. Ie, open writer,
> update a few docs, close. Do the same again, but, that 2nd session
> cannot overwrite the same files from the first one, since readers may
> have those files open. The "write once" model gives Lucene its
> transactional semant
Hi,
I am looking for a way to represent term frequency data in a vector
space, thus using unique integer identifiers instead of string. This
would allow feeding tools like LIBSVM from a Lucene index.
A small example: TermFreqVector.toString() produces "{TITLE: one/3,
two/4}". What I am looki
On Mon, Jan 18, 2010 at 12:35 AM, Babak Farhang wrote:
>> Got it. You'd presumably have to add a generation to this file, so
>> that multiple sessions of updates + commit write to private files
>> ("write once")? And then the reader merges all of them.
>
> Actually, I hadn't considered concurre
There's a section on reusing Fields and Documents in
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed. And lots of
other good tips.
--
Ian.
On Mon, Jan 18, 2010 at 10:05 AM, Michael McCandless
wrote:
> It should give some perf improvement, reducing GC costs.
>
> But you don't need to se
It should give some perf improvement, reducing GC costs.
But you don't need to set field values to null -- just set to the
values of the next doc to index, and index with that.
If your machine has available hardware concurrency, using threads
should give you even more gains.
Mike
On Mon, Jan 18
Hello all,
I am indexing millions of documents. The app is single threaded. I need to
create Document and Fields objects repeatedly. I have a thought to create it
once and reuse by setting the field values to null.
Is this advisable, Will it give any performance improvement?
Regards
Ganesh