Re: "docMap" array in SegmentMergeInfo

2005-10-13 Thread Peter Keegan
Hi Yonik, Your patch has corrected the thread thrashing problem on multi-cpu systems. I've tested it with both 1.4.3 and 1.9. I haven't seen 100X performance gain, but that's because I'm caching QueryFilters and Lucene is caching the sort fields. Thanks for the fast response! btw, I had previous

Re: "docMap" array in SegmentMergeInfo

2005-10-12 Thread Yonik Seeley
Here's the patch: http://issues.apache.org/jira/browse/LUCENE-454 It resulted in quite a performance boost indeed! On 10/12/05, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > Thanks for the trace Peter, and great catch! > It certainly does look like avoiding the construction of the docMap for a > Mu

Re: "docMap" array in SegmentMergeInfo

2005-10-12 Thread Yonik Seeley
Thanks for the trace Peter, and great catch! It certainly does look like avoiding the construction of the docMap for a MultiTermEnum will be a significant optimization. -Yonik Now hiring -- http://tinyurl.com/7m67g On 10/12/05, Peter Keegan <[EMAIL PROTECTED]> wrote: > > Here is one stack trace:

Re: "docMap" array in SegmentMergeInfo

2005-10-12 Thread Peter Keegan
Here is one stack trace: Full thread dump Java HotSpot(TM) Client VM (1.5.0_03-b07 mixed mode): "Thread-6" prio=5 tid=0x6cf7a7f0 nid=0x59e50 waiting for monitor entry [0x6d2cf000..0x6d2cfd6c] at org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java:241) - waiting to lock <0x04e40278>

Re: "docMap" array in SegmentMergeInfo

2005-10-11 Thread Dawid Weiss
I'm pretty sure it doesn't solve the problem in general (it isn't a thread-save solution for sure, you mentioned the memory barrier, I'd add compiler optimizations). If it works it must be something application-specific, maybe synchronization isn't really needed there, or you just don't do an

Re: "docMap" array in SegmentMergeInfo

2005-10-11 Thread Yonik Seeley
> We've been using this in production for a while and it fixed the > extremely slow searches when there are deleted documents. Who was the caller of isDeleted()? There may be an opportunity for an easy optimization to grab the BitVector and reuse it instead of repeatedly calling isDeleted() on the

Re: "docMap" array in SegmentMergeInfo

2005-10-11 Thread Yonik Seeley
I'm not sure that looks like a safe patch. Synchronization does more than help prevent races... it also introduces memory barriers. Removing synchronization to objects that can change is very tricky business (witness the double-checked locking antipattern). -Yonik Now hiring -- http://tinyurl.com/

Re: "docMap" array in SegmentMergeInfo

2005-10-11 Thread Chris Lamprecht
Hi Peter, I observed the same issue on a multiprocessor machine. I included a small fix for this in the NIO patch (against the 1.9 trunk) here: http://issues.apache.org/jira/browse/LUCENE-414#action_12322523 The change amounts to the following methods in SegmentReader.java, to remove the need s

Re: "docMap" array in SegmentMergeInfo

2005-10-11 Thread Peter Keegan
> If the index is in 'search/read-only' mode, is there a way around this bottleneck? The obvious answer (to answer my own question) is to optimize the index. But the question remains: why is the docMap created and never used? Peter

Re: "docMap" array in SegmentMergeInfo

2005-10-11 Thread Peter Keegan
On a multi-cpu system, this loop to build the docMap array can cause severe thread thrashing because of the synchronized method 'isDeleted'. I have observed this on an index with over 1 million documents (which contains a few thousand deleted docs) when multiple threads perform a search with either

Re: "docMap" array in SegmentMergeInfo

2005-07-13 Thread Doug Cutting
Lokesh Bajaj wrote: For a very large index where we might want to delete/replace some documents, this would require a lot of memory (for 100 million documents, this would need 381 MB of memory). Is there any reason why this was implemented this way? In practice this has not been an issue. A

"docMap" array in SegmentMergeInfo

2005-07-13 Thread Lokesh Bajaj
I noticed the following code that builds the "docMap" array in SegmentMergeInfo.java for the case where some documents might be deleted from an index: // build array which maps document numbers around deletions if (reader.hasDeletions()) { int maxDoc = reader.maxDoc(); docM