Re: In memory Lucene configuration

Simon Willnauer Sun, 15 Jul 2012 01:56:27 -0700

hey there,

On Sun, Jul 15, 2012 at 10:41 AM, Doron Yaacoby
<[email protected]> wrote:
> Hi, I have the following situation:
>
> I have two pretty large indices. One consists of about 1 billion documents 
> (takes ~6GB on disk) and the other has about 2 billion documents (~10GB on 
> disk). The documents are very short (4-5 terms each in the text field, and 
> one numeric field with a long value). This is a read only index - I'm only 
> going to read from it and never write. There is only one segment in each 
> index (At least there should be, I called forceMerge(1) on them).
>
> Search latency is the most important thing to me. I need it to be blazing 
> fast, ~20ms per query. Queries are always of the type +term1 +term2 +term3, 
> and I'm asking for 10 results from each index (searching is done 
> simultaneously on both indices).
>
> I have a fast server (12 cores@3GHz each) with 32Gb RAM (running Linux) and I 
> can keep both indices in-memory when using a RAMDirectory. This didn't 
> achieve the expected result (average query time = ~43ms). I'm seeing latency 
> spikes, where the same query is sometimes answered in 10ms, but in a 
> different occasion takes 2-3 seconds. I'm guessing this is due to GC (as 
> explained 
> here<http://lucene.472066.n3.nabble.com/Plans-to-remove-RAMDirectory-td3601156.html>).
>  Using a warmed up MMapDirectory didn't help; the average query time was a 
> bit slower. I tried using InstantiatedIndex, but it has a huge memory 
> consumption, I couldn't even load the smaller 6GB index.


its very hard to believe that you can't get this returning results
faster though. I'd definitely recommend you MMapDirectory here or NIO
should do too. When you measure this do you measure a large number of
different queries or just a handful? Do you discard the first queries
until caches are warmed up? What are you measuring, pure search time
including doc loading?
If you use MMapDir how much memory do you grant to your JVM? I'd
recommend you to sum up the term dictionary file size (.tii) and the
norm file size (nrm) and give the JVM something like 3x the size as
Xmx and Xms provided you don't need any more memory elsewhere. A guess
from the given index is that Xmx1G Xms1G should do the job and let the
Filesystem use the rest (that is important for lucene if you use MMap
/ NIOFS)

Your queries are straight boolean conjunctions or do you use positions
ie phrase queries or spans?

simon
>
> Any ideas about what could be the ideal configuration for me?
> Thanks.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: In memory Lucene configuration

Reply via email to