RE: In memory Lucene configuration

Doron Yaacoby Sun, 15 Jul 2012 23:32:42 -0700

Another interesting fact I just found out.
Up until now I measured query execution time via my application. Meaning, the 
application would log each query it sends to Lucene and the time it takes to 
run it. The nature of my application is that there will be a variable number of 
lucene queries per second (2-3 usually, but could be more or less), so there 
isn't constant 'pressure' on Lucene.
I now created a new test which runs the same queries but independently from my 
application.  This achieved much better results: MMap implementation ~17ms, and 
RAMDirectory ~19ms. Moreover, the results are now reproducible, meaning there 
aren't any spikes in the query times. When running through my application 
scenario, I got the occasional spike, where a query took 2-3 seconds. In the 
MMap case, I guess it could be that the OS sees some caches as unused for a 
while and reclaims them? I can't really explain this phenomenon in the 
RAMDirectory case.

I'm currently trying to recreate this by sleeping a random time before each 
query, but still without success. Will update...

-----Original Message-----
From: Doron Yaacoby [mailto:dor...@gingersoftware.com] 
Sent: 15 July 2012 13:40
To: java-user@lucene.apache.org; simon.willna...@gmail.com
Subject: RE: In memory Lucene configuration

Thanks for the quick input!
I ran a few more tests with your suggested configuration (-Xmx1G -Xms1G with 
MMapDirectory). At the third time I ran the same test I finally got an 
improvement - an average of ~30ms per query, although it's still not as fast as 
I need it to be. 
The test contains about 2200 different queries (well, some are repeated twice 
or thrice), and includes search time and doc loading (reading the two fields I 
mentioned). The queries are all straight boolean conjunctions, and yes, I am 
dropping the first few queries when calculating averages.

BTW, didn't mention before that I'm using Lucene 3.5 and Java 1.7.

-----Original Message-----
From: Simon Willnauer [mailto:simon.willna...@gmail.com] 
Sent: 15 July 2012 11:56
To: java-user@lucene.apache.org
Subject: Re: In memory Lucene configuration

hey there,

On Sun, Jul 15, 2012 at 10:41 AM, Doron Yaacoby <dor...@gingersoftware.com> 
wrote:
> Hi, I have the following situation:
>
> I have two pretty large indices. One consists of about 1 billion documents 
> (takes ~6GB on disk) and the other has about 2 billion documents (~10GB on 
> disk). The documents are very short (4-5 terms each in the text field, and 
> one numeric field with a long value). This is a read only index - I'm only 
> going to read from it and never write. There is only one segment in each 
> index (At least there should be, I called forceMerge(1) on them).
>
> Search latency is the most important thing to me. I need it to be blazing 
> fast, ~20ms per query. Queries are always of the type +term1 +term2 +term3, 
> and I'm asking for 10 results from each index (searching is done 
> simultaneously on both indices).
>
> I have a fast server (12 cores@3GHz each) with 32Gb RAM (running Linux) and I 
> can keep both indices in-memory when using a RAMDirectory. This didn't 
> achieve the expected result (average query time = ~43ms). I'm seeing latency 
> spikes, where the same query is sometimes answered in 10ms, but in a 
> different occasion takes 2-3 seconds. I'm guessing this is due to GC (as 
> explained 
> here<http://lucene.472066.n3.nabble.com/Plans-to-remove-RAMDirectory-td3601156.html>).
>  Using a warmed up MMapDirectory didn't help; the average query time was a 
> bit slower. I tried using InstantiatedIndex, but it has a huge memory 
> consumption, I couldn't even load the smaller 6GB index.

its very hard to believe that you can't get this returning results faster 
though. I'd definitely recommend you MMapDirectory here or NIO should do too. 
When you measure this do you measure a large number of different queries or 
just a handful? Do you discard the first queries until caches are warmed up? 
What are you measuring, pure search time including doc loading?
If you use MMapDir how much memory do you grant to your JVM? I'd recommend you 
to sum up the term dictionary file size (.tii) and the norm file size (nrm) and 
give the JVM something like 3x the size as Xmx and Xms provided you don't need 
any more memory elsewhere. A guess from the given index is that Xmx1G Xms1G 
should do the job and let the Filesystem use the rest (that is important for 
lucene if you use MMap / NIOFS)

Your queries are straight boolean conjunctions or do you use positions ie 
phrase queries or spans?

simon
>
> Any ideas about what could be the ideal configuration for me?
> Thanks.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: In memory Lucene configuration

Reply via email to