Re: In memory Lucene configuration

Simon Willnauer Mon, 16 Jul 2012 02:04:07 -0700

your spikes could be due to garbage collection. Since you are on java
1.7 you could try this commandline (blind shot):


  java -server
  -Xms1G
  -Xmx1G
  -Xss128k
  -XX:+UseParNewGC
  -XX:+UseConcMarkSweepGC
  -XX:CMSInitiatingOccupancyFraction=75
  -XX:+UseCMSInitiatingOccupancyOnly


or maybe try the new G1 collector while it usually only useful for larger heaps:

  java -server
  -Xms1G
  -Xmx1G
  -Xss128k
  -XX:+UseG1GC

simon




On Mon, Jul 16, 2012 at 8:43 AM, Doron Yaacoby
<dor...@gingersoftware.com> wrote:
> I haven't tried that yet, but it's an option. The reason I'm waiting on this 
> is that I am expecting many concurrent requests to my application anyway, so 
> having multiple search threads per request might not be the best idea in 
> production.
>
> -----Original Message-----
> From: Vitaly Funstein [mailto:vfunst...@gmail.com]
> Sent: 16 July 2012 08:26
> To: java-user@lucene.apache.org
> Subject: Re: In memory Lucene configuration
>
> Have you tried sharding your data? Since you have a fast multi-core box, why 
> not split your indices N-ways, say the smaller one into 4, and the larger 
> into 8. Then you can have a pool of dedicated search threads, executing the 
> same query against separate physical indices within each "logical" one in 
> parallel, then put the results together in the calling thread. Yes, it's more 
> code to write and test in the app layer, but it may turn out to be well worth 
> it. Due to GC overhead and poor synchronization characteristics, RAMDirectory 
> is definitely not the way to go at this scale, as you probably already 
> suspect.
>
> On Sun, Jul 15, 2012 at 3:40 AM, Doron Yaacoby <dor...@gingersoftware.com> 
> wrote:
>> Thanks for the quick input!
>> I ran a few more tests with your suggested configuration (-Xmx1G -Xms1G with 
>> MMapDirectory). At the third time I ran the same test I finally got an 
>> improvement - an average of ~30ms per query, although it's still not as fast 
>> as I need it to be.
>> The test contains about 2200 different queries (well, some are repeated 
>> twice or thrice), and includes search time and doc loading (reading the two 
>> fields I mentioned). The queries are all straight boolean conjunctions, and 
>> yes, I am dropping the first few queries when calculating averages.
>>
>> BTW, didn't mention before that I'm using Lucene 3.5 and Java 1.7.
>>
>> -----Original Message-----
>> From: Simon Willnauer [mailto:simon.willna...@gmail.com]
>> Sent: 15 July 2012 11:56
>> To: java-user@lucene.apache.org
>> Subject: Re: In memory Lucene configuration
>>
>> hey there,
>>
>> On Sun, Jul 15, 2012 at 10:41 AM, Doron Yaacoby <dor...@gingersoftware.com> 
>> wrote:
>>> Hi, I have the following situation:
>>>
>>> I have two pretty large indices. One consists of about 1 billion documents 
>>> (takes ~6GB on disk) and the other has about 2 billion documents (~10GB on 
>>> disk). The documents are very short (4-5 terms each in the text field, and 
>>> one numeric field with a long value). This is a read only index - I'm only 
>>> going to read from it and never write. There is only one segment in each 
>>> index (At least there should be, I called forceMerge(1) on them).
>>>
>>> Search latency is the most important thing to me. I need it to be blazing 
>>> fast, ~20ms per query. Queries are always of the type +term1 +term2 +term3, 
>>> and I'm asking for 10 results from each index (searching is done 
>>> simultaneously on both indices).
>>>
>>> I have a fast server (12 cores@3GHz each) with 32Gb RAM (running Linux) and 
>>> I can keep both indices in-memory when using a RAMDirectory. This didn't 
>>> achieve the expected result (average query time = ~43ms). I'm seeing 
>>> latency spikes, where the same query is sometimes answered in 10ms, but in 
>>> a different occasion takes 2-3 seconds. I'm guessing this is due to GC (as 
>>> explained 
>>> here<http://lucene.472066.n3.nabble.com/Plans-to-remove-RAMDirectory-td3601156.html>).
>>>  Using a warmed up MMapDirectory didn't help; the average query time was a 
>>> bit slower. I tried using InstantiatedIndex, but it has a huge memory 
>>> consumption, I couldn't even load the smaller 6GB index.
>>
>> its very hard to believe that you can't get this returning results faster 
>> though. I'd definitely recommend you MMapDirectory here or NIO should do 
>> too. When you measure this do you measure a large number of different 
>> queries or just a handful? Do you discard the first queries until caches are 
>> warmed up? What are you measuring, pure search time including doc loading?
>> If you use MMapDir how much memory do you grant to your JVM? I'd
>> recommend you to sum up the term dictionary file size (.tii) and the
>> norm file size (nrm) and give the JVM something like 3x the size as
>> Xmx and Xms provided you don't need any more memory elsewhere. A guess
>> from the given index is that Xmx1G Xms1G should do the job and let the
>> Filesystem use the rest (that is important for lucene if you use MMap
>> / NIOFS)
>>
>> Your queries are straight boolean conjunctions or do you use positions ie 
>> phrase queries or spans?
>>
>> simon
>>>
>>> Any ideas about what could be the ideal configuration for me?
>>> Thanks.
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: In memory Lucene configuration

Reply via email to