RE: In memory Lucene configuration

Doron Yaacoby Wed, 18 Jul 2012 23:24:25 -0700

I had a threading issue in the client code calling Lucene, really nothing that 
has anything to do with this list :)


-----Original Message-----
From: Simon Willnauer [mailto:[email protected]] 
Sent: 18 July 2012 21:48
To: [email protected]
Subject: Re: In memory Lucene configuration

doron, enlighten me please!

On Wed, Jul 18, 2012 at 1:32 PM, Doron Yaacoby <[email protected]> 
wrote:
> Glad to announce the problem was on my side, and had nothing to do with 
> Lucene. Indeed, looks like that MMapDirectory is the best choice for me.
>
> Thanks again.
>
> -----Original Message-----
> From: Doron Yaacoby [mailto:[email protected]]
> Sent: 16 July 2012 09:43
> To: [email protected]
> Subject: RE: In memory Lucene configuration
>
> I haven't tried that yet, but it's an option. The reason I'm waiting on this 
> is that I am expecting many concurrent requests to my application anyway, so 
> having multiple search threads per request might not be the best idea in 
> production.
>
> -----Original Message-----
> From: Vitaly Funstein [mailto:[email protected]]
> Sent: 16 July 2012 08:26
> To: [email protected]
> Subject: Re: In memory Lucene configuration
>
> Have you tried sharding your data? Since you have a fast multi-core box, why 
> not split your indices N-ways, say the smaller one into 4, and the larger 
> into 8. Then you can have a pool of dedicated search threads, executing the 
> same query against separate physical indices within each "logical" one in 
> parallel, then put the results together in the calling thread. Yes, it's more 
> code to write and test in the app layer, but it may turn out to be well worth 
> it. Due to GC overhead and poor synchronization characteristics, RAMDirectory 
> is definitely not the way to go at this scale, as you probably already 
> suspect.
>
> On Sun, Jul 15, 2012 at 3:40 AM, Doron Yaacoby <[email protected]> 
> wrote:
>> Thanks for the quick input!
>> I ran a few more tests with your suggested configuration (-Xmx1G -Xms1G with 
>> MMapDirectory). At the third time I ran the same test I finally got an 
>> improvement - an average of ~30ms per query, although it's still not as fast 
>> as I need it to be.
>> The test contains about 2200 different queries (well, some are repeated 
>> twice or thrice), and includes search time and doc loading (reading the two 
>> fields I mentioned). The queries are all straight boolean conjunctions, and 
>> yes, I am dropping the first few queries when calculating averages.
>>
>> BTW, didn't mention before that I'm using Lucene 3.5 and Java 1.7.
>>
>> -----Original Message-----
>> From: Simon Willnauer [mailto:[email protected]]
>> Sent: 15 July 2012 11:56
>> To: [email protected]
>> Subject: Re: In memory Lucene configuration
>>
>> hey there,
>>
>> On Sun, Jul 15, 2012 at 10:41 AM, Doron Yaacoby <[email protected]> 
>> wrote:
>>> Hi, I have the following situation:
>>>
>>> I have two pretty large indices. One consists of about 1 billion documents 
>>> (takes ~6GB on disk) and the other has about 2 billion documents (~10GB on 
>>> disk). The documents are very short (4-5 terms each in the text field, and 
>>> one numeric field with a long value). This is a read only index - I'm only 
>>> going to read from it and never write. There is only one segment in each 
>>> index (At least there should be, I called forceMerge(1) on them).
>>>
>>> Search latency is the most important thing to me. I need it to be blazing 
>>> fast, ~20ms per query. Queries are always of the type +term1 +term2 +term3, 
>>> and I'm asking for 10 results from each index (searching is done 
>>> simultaneously on both indices).
>>>
>>> I have a fast server (12 cores@3GHz each) with 32Gb RAM (running Linux) and 
>>> I can keep both indices in-memory when using a RAMDirectory. This didn't 
>>> achieve the expected result (average query time = ~43ms). I'm seeing 
>>> latency spikes, where the same query is sometimes answered in 10ms, but in 
>>> a different occasion takes 2-3 seconds. I'm guessing this is due to GC (as 
>>> explained 
>>> here<http://lucene.472066.n3.nabble.com/Plans-to-remove-RAMDirectory-td3601156.html>).
>>>  Using a warmed up MMapDirectory didn't help; the average query time was a 
>>> bit slower. I tried using InstantiatedIndex, but it has a huge memory 
>>> consumption, I couldn't even load the smaller 6GB index.
>>
>> its very hard to believe that you can't get this returning results faster 
>> though. I'd definitely recommend you MMapDirectory here or NIO should do 
>> too. When you measure this do you measure a large number of different 
>> queries or just a handful? Do you discard the first queries until caches are 
>> warmed up? What are you measuring, pure search time including doc loading?
>> If you use MMapDir how much memory do you grant to your JVM? I'd 
>> recommend you to sum up the term dictionary file size (.tii) and the 
>> norm file size (nrm) and give the JVM something like 3x the size as 
>> Xmx and Xms provided you don't need any more memory elsewhere. A 
>> guess from the given index is that Xmx1G Xms1G should do the job and 
>> let the Filesystem use the rest (that is important for lucene if you 
>> use MMap / NIOFS)
>>
>> Your queries are straight boolean conjunctions or do you use positions ie 
>> phrase queries or spans?
>>
>> simon
>>>
>>> Any ideas about what could be the ideal configuration for me?
>>> Thanks.
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: In memory Lucene configuration

Reply via email to