Re: incremental document field update

2010-01-18 Thread Babak Farhang
> This is about multiple sessions with the writer. Ie, open writer, > update a few docs, close. Do the same again, but, that 2nd session > cannot overwrite the same files from the first one, since readers may > have those files open. The "write once" model gives Lucene its > transactional semant

unique term identifiers

2010-01-18 Thread Solt, Illés
Hi, I am looking for a way to represent term frequency data in a vector space, thus using unique integer identifiers instead of string. This would allow feeding tools like LIBSVM from a Lucene index. A small example: TermFreqVector.toString() produces "{TITLE: one/3, two/4}". What I am looki

Re: incremental document field update

2010-01-18 Thread Michael McCandless
On Mon, Jan 18, 2010 at 12:35 AM, Babak Farhang wrote: >> Got it. You'd presumably have to add a generation to this file, so >> that multiple sessions of updates + commit write to private files >> ("write once")? And then the reader merges all of them. > > Actually, I hadn't considered concurre

Re: Reusing Document and Field objects

2010-01-18 Thread Ian Lea
There's a section on reusing Fields and Documents in http://wiki.apache.org/lucene-java/ImproveIndexingSpeed. And lots of other good tips. -- Ian. On Mon, Jan 18, 2010 at 10:05 AM, Michael McCandless wrote: > It should give some perf improvement, reducing GC costs. > > But you don't need to se

Re: Reusing Document and Field objects

2010-01-18 Thread Michael McCandless
It should give some perf improvement, reducing GC costs. But you don't need to set field values to null -- just set to the values of the next doc to index, and index with that. If your machine has available hardware concurrency, using threads should give you even more gains. Mike On Mon, Jan 18

Reusing Document and Field objects

2010-01-18 Thread Ganesh
Hello all, I am indexing millions of documents. The app is single threaded. I need to create Document and Fields objects repeatedly. I have a thought to create it once and reuse by setting the field values to null. Is this advisable, Will it give any performance improvement? Regards Ganesh