In memory Lucene configuration

Doron Yaacoby Sun, 15 Jul 2012 01:41:36 -0700

Hi, I have the following situation:

I have two pretty large indices. One consists of about 1 billion documents 
(takes ~6GB on disk) and the other has about 2 billion documents (~10GB on 
disk). The documents are very short (4-5 terms each in the text field, and one 
numeric field with a long value). This is a read only index - I'm only going to 
read from it and never write. There is only one segment in each index (At least 
there should be, I called forceMerge(1) on them).


Search latency is the most important thing to me. I need it to be blazing fast, 
~20ms per query. Queries are always of the type +term1 +term2 +term3, and I'm 
asking for 10 results from each index (searching is done simultaneously on both 
indices).

I have a fast server (12 cores@3GHz each) with 32Gb RAM (running Linux) and I 
can keep both indices in-memory when using a RAMDirectory. This didn't achieve 
the expected result (average query time = ~43ms). I'm seeing latency spikes, 
where the same query is sometimes answered in 10ms, but in a different occasion 
takes 2-3 seconds. I'm guessing this is due to GC (as explained 
here<http://lucene.472066.n3.nabble.com/Plans-to-remove-RAMDirectory-td3601156.html>).
 Using a warmed up MMapDirectory didn't help; the average query time was a bit 
slower. I tried using InstantiatedIndex, but it has a huge memory consumption, 
I couldn't even load the smaller 6GB index.

Any ideas about what could be the ideal configuration for me?
Thanks.

In memory Lucene configuration

Reply via email to