Re: search performance

2014-06-02 Thread Christoph Kaser
Can you take thread stacktraces (repeatedly) during those 5 minute searches? That might give you (or someone on the mailing list) a clue where all that time is spent. You could try using jstack for that: http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html Regards Christoph

Re: search performance

2014-06-02 Thread Jamie
Toke Thanks for the comment. Unfortunately, in this instance, it is a live production system, so we cannot conduct experiments. The number is definitely accurate. We have many different systems with a similar load that observe the same performance issue. To my knowledge, the Lucene integrati

Re: search performance

2014-06-02 Thread Toke Eskildsen
On Mon, 2014-06-02 at 08:51 +0200, Jamie wrote: [200GB, 150M documents] > With NRT enabled, search speed is roughly 5 minutes on average. > The server resources are: > 2x6 Core Intel CPU, 128GB, 2 SSD for index and RAID 0, with Linux. 5 minutes is extremely long. Is that really the right number

Possible order violation in lucene library version 2.4.1

2014-06-02 Thread Swarnendu Biswas
Hi, I am working on a research project on data race detection, and am using the DaCapo benchmarks for evaluation. I am using the benchmark lusearch from the 2009 suite, which uses lucene library 2.4.1. For one test case, I am monitoring a pair of accesses say,  Lorg/apache/lucene/store/Dire

Re: search performance

2014-06-02 Thread Tri Cao
This is an interesting performance problem and I think there is probably not a single answer here, so I'll just layout the steps I would take to tackle this: 1. What is the variance of the query latency? You said the average is 5 minutes, but is it due to some really bad queries or most queries h

Re: search performance

2014-06-02 Thread Jamie
I assume you meant 1000 documents. Yes, the page size is in fact configurable. However, it only obtains the page size * 3. It preloads the following and previous page too. The point is, it only obtains the documents that are needed. On 2014/06/02, 3:03 PM, Tincu Gabriel wrote: My bad, It's u

Re: search performance

2014-06-02 Thread Tincu Gabriel
My bad, It's using the RamDirectory as a cache and a delegate directory that you pass in the constructor to do the disk operations, limiting the use of the RamDirectory to files that fit a certain size. So i guess the underlying Directory implementation will be whatever you choose it to be. I'd sti

Re: search performance

2014-06-02 Thread Jamie
I was under the impression that NRTCachingDirectory will instantiate an MMapDirectory if a 64 bit platform is detected? Is this not the case? On 2014/06/02, 2:09 PM, Tincu Gabriel wrote: MMapDirectory will do the job for you. RamDirectory has a big warning in the class description stating that

Re: search performance

2014-06-02 Thread Tincu Gabriel
MMapDirectory will do the job for you. RamDirectory has a big warning in the class description stating that the performance will get killed by an index larger than a few hundred MB, and NRTCachingDirectory is a wrapper for RamDirectory and suitable for low update rates. MMap will use the system RAM

Re: MultiReader docid reliability

2014-06-02 Thread Nicola Buso
Hi Erick, the good reason for now is caching, we use them to store the results in cache, and I wanted a better explanation of "ephemeral" do understand the possible life of the cache. >From the answers, ephemeral can be related to the opening of the indexreader (in general for precaution) and all

Re: search performance

2014-06-02 Thread Jamie
Jack First off, thanks for applying your mind to our performance problem. On 2014/06/02, 1:34 PM, Jack Krupansky wrote: Do you have enough system memory to fit the entire index in OS system memory so that the OS can fully cache it instead of thrashing with I/O? Do you see a lot of I/O or are t

Re: search performance

2014-06-02 Thread Jack Krupansky
Do you have enough system memory to fit the entire index in OS system memory so that the OS can fully cache it instead of thrashing with I/O? Do you see a lot of I/O or are the queries compute-bound? You said you have a 128GB machine, so that sounds small for your index. Have you tried a 256GB

Re: search performance

2014-06-02 Thread Jamie
Tom Thanks for the offer of assistance. On 2014/06/02, 12:02 PM, Tincu Gabriel wrote: What kind of queries are you pushing into the index. We are indexing regular emails + attachments. Typical query is something like: filter: to:mbox08 from:mbox08 cc:mbox08 bcc:mbox08 deliver

Re: search performance

2014-06-02 Thread Tincu Gabriel
What kind of queries are you pushing into the index. Do they match a lot of documents ? Do you do any sorting on the result set? What is the average document size ? Do you have a lot of update traffic ? What kind of schema does your index use ? On Mon, Jun 2, 2014 at 6:51 AM, Jamie wrote: > Gre

Re: remapping docIds in a read only offline built index

2014-06-02 Thread Olivier Binda
Very nice ! That is exactly what I needed. Thank you very much ! On 06/02/2014 09:26 AM, Michael McCandless wrote: The index sorting APIs (in lucene/misc) can do this. E.g. you could make a SortingAtomicReader, with your sort criteria, then use addIndexes(IR[]) to add it to a new index. That

Re: remapping docIds in a read only offline built index

2014-06-02 Thread Michael McCandless
The index sorting APIs (in lucene/misc) can do this. E.g. you could make a SortingAtomicReader, with your sort criteria, then use addIndexes(IR[]) to add it to a new index. That resulting index would have 1 segment and the docIDs would be in your order. Mike McCandless http://blog.mikemccandles

Re: remapping docIds in a read only offline built index

2014-06-02 Thread Olivier Binda
Hello, I'm still interested in having the answer to the following question : In a 1-segment read-only index (that is built offline once and then frozen), is it possible to remap the docIds ? I may have a (working but not optimal) answer to my original problem : I may use a MultiReader and 3