Hello Otis,
> 
> Hello Ard,
> 
> What you are after is a higher mergeFactor and probably also 
> a higher maxBufferedDocs.  Is indexing performance the concern?

No, this is not what I am after, and the mergeFactor isn't really solving my 
issue. My issue is very similar to (I read this thread later) the thread 
"maxDocs and Arrays" , 
http://www.gossamer-threads.com/lists/lucene/java-user/49285. 

I also want to keep some sort of derived data of lucene in memory arrays, to 
enable faceted authorized navigation in  a jackrabbit (jcr) repository. I have 
tested for millions of "derived data documents" in a array and can very 
efficiently compute faceted auth nav. But, of course, as the lucene index 
changes, i need to update my derived data. For adding a document to lucene, i 
can normally just append an item to my derived data array, unless:

1) lucene did a merge, and
2) after the merge writer.docCount() != writerDoccountBeforeUpdate + 1 (this 
means the merge involved merging a segment where at least one deleted doc was 
present, reducing docCount)

if 1 and 2 are true, then i need to recreate my derived data array, because the 
array locations do not coincide with those from lucene anymore. Therefore, i 
want to minimize merges (recreating the array is expensive), which of course 
can be done as you say by setting a large mergeFactor (and for example use 
compoundFile is true to reduce the number of files again) and a large 
maxBufferedDocs. But, increasing the default number of documents in the 
"smallest" segments from 10 to, say 100, would also help me. 

Then again, I am not sure wether i am doing something which can be achieved 
more effectively/simply,

thanks in advance for any pointers,

Regards Ard Schrijvers


> Don't go crazy with setting a super high (e.g. 100+) 
> mergeFactor, unless you really have the number of open files 
> on your server(s) set to a solid/high number. maxBufferedDocs 
> can be set to a much higher number, typically, depending on 
> the size of the documents you are trying to index and the 
> amount of heap the JVM has to work with.  There is also a new 
> API for explicit flushes of in-memory documents while 
> indexing to control memory consumption.
> 
> Otis
> --
> Lucene Consulting -- http://lucene-consulting.com/
> 
> 
> ----- Original Message ----
> From: Ard Schrijvers <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Friday, May 25, 2007 8:40:26 AM
> Subject: RE: Setting the maximum number of documents in a 
> lucene segment
> 
> 
> > 
> > Hello,
> > 
> > I am trying to change the maximum number of documents in a 
> > lucene segment. By default it seems to be 10.
> 
> Correction: 10 for the smallest (just created) segments of 
> course, because obviously merged segments are likely to 
> contain many more documents
> 
> > When I have a 
> > mergeFactor of say 10, then on average, after every 100 added 
> > documents lucene is merging segments.
> > 
> > I want each segment to contain more then the default 10 
> > documents, because I need to minimize merging.
> > 
> > Is there a way to achieve this? 
> > writer.setMaxBufferedDocs(largeValue) does not do the trick 
> > (I think because in my case because the writer is flushed and 
> > closed after an few updates)
> > 
> > Does anyone know wether it is possible to make the default 
> > number of documents a segment can contain larger?
> > 
> > Thanks in advance, 
> > 
> > Ard Schrijvers
> > 
> > 
> > -- 
> > 
> > Hippo
> > Oosteinde 11
> > 1017WT Amsterdam
> > The Netherlands
> > Tel  +31 (0)20 5224466
> > -------------------------------------------------------------
> > [EMAIL PROTECTED] / http://www.hippo.nl
> > -------------------------------------------------------------- 
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to