your spikes could be due to garbage collection. Since you are on java 1.7 you could try this commandline (blind shot):
java -server -Xms1G -Xmx1G -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly or maybe try the new G1 collector while it usually only useful for larger heaps: java -server -Xms1G -Xmx1G -Xss128k -XX:+UseG1GC simon On Mon, Jul 16, 2012 at 8:43 AM, Doron Yaacoby <dor...@gingersoftware.com> wrote: > I haven't tried that yet, but it's an option. The reason I'm waiting on this > is that I am expecting many concurrent requests to my application anyway, so > having multiple search threads per request might not be the best idea in > production. > > -----Original Message----- > From: Vitaly Funstein [mailto:vfunst...@gmail.com] > Sent: 16 July 2012 08:26 > To: java-user@lucene.apache.org > Subject: Re: In memory Lucene configuration > > Have you tried sharding your data? Since you have a fast multi-core box, why > not split your indices N-ways, say the smaller one into 4, and the larger > into 8. Then you can have a pool of dedicated search threads, executing the > same query against separate physical indices within each "logical" one in > parallel, then put the results together in the calling thread. Yes, it's more > code to write and test in the app layer, but it may turn out to be well worth > it. Due to GC overhead and poor synchronization characteristics, RAMDirectory > is definitely not the way to go at this scale, as you probably already > suspect. > > On Sun, Jul 15, 2012 at 3:40 AM, Doron Yaacoby <dor...@gingersoftware.com> > wrote: >> Thanks for the quick input! >> I ran a few more tests with your suggested configuration (-Xmx1G -Xms1G with >> MMapDirectory). At the third time I ran the same test I finally got an >> improvement - an average of ~30ms per query, although it's still not as fast >> as I need it to be. >> The test contains about 2200 different queries (well, some are repeated >> twice or thrice), and includes search time and doc loading (reading the two >> fields I mentioned). The queries are all straight boolean conjunctions, and >> yes, I am dropping the first few queries when calculating averages. >> >> BTW, didn't mention before that I'm using Lucene 3.5 and Java 1.7. >> >> -----Original Message----- >> From: Simon Willnauer [mailto:simon.willna...@gmail.com] >> Sent: 15 July 2012 11:56 >> To: java-user@lucene.apache.org >> Subject: Re: In memory Lucene configuration >> >> hey there, >> >> On Sun, Jul 15, 2012 at 10:41 AM, Doron Yaacoby <dor...@gingersoftware.com> >> wrote: >>> Hi, I have the following situation: >>> >>> I have two pretty large indices. One consists of about 1 billion documents >>> (takes ~6GB on disk) and the other has about 2 billion documents (~10GB on >>> disk). The documents are very short (4-5 terms each in the text field, and >>> one numeric field with a long value). This is a read only index - I'm only >>> going to read from it and never write. There is only one segment in each >>> index (At least there should be, I called forceMerge(1) on them). >>> >>> Search latency is the most important thing to me. I need it to be blazing >>> fast, ~20ms per query. Queries are always of the type +term1 +term2 +term3, >>> and I'm asking for 10 results from each index (searching is done >>> simultaneously on both indices). >>> >>> I have a fast server (12 cores@3GHz each) with 32Gb RAM (running Linux) and >>> I can keep both indices in-memory when using a RAMDirectory. This didn't >>> achieve the expected result (average query time = ~43ms). I'm seeing >>> latency spikes, where the same query is sometimes answered in 10ms, but in >>> a different occasion takes 2-3 seconds. I'm guessing this is due to GC (as >>> explained >>> here<http://lucene.472066.n3.nabble.com/Plans-to-remove-RAMDirectory-td3601156.html>). >>> Using a warmed up MMapDirectory didn't help; the average query time was a >>> bit slower. I tried using InstantiatedIndex, but it has a huge memory >>> consumption, I couldn't even load the smaller 6GB index. >> >> its very hard to believe that you can't get this returning results faster >> though. I'd definitely recommend you MMapDirectory here or NIO should do >> too. When you measure this do you measure a large number of different >> queries or just a handful? Do you discard the first queries until caches are >> warmed up? What are you measuring, pure search time including doc loading? >> If you use MMapDir how much memory do you grant to your JVM? I'd >> recommend you to sum up the term dictionary file size (.tii) and the >> norm file size (nrm) and give the JVM something like 3x the size as >> Xmx and Xms provided you don't need any more memory elsewhere. A guess >> from the given index is that Xmx1G Xms1G should do the job and let the >> Filesystem use the rest (that is important for lucene if you use MMap >> / NIOFS) >> >> Your queries are straight boolean conjunctions or do you use positions ie >> phrase queries or spans? >> >> simon >>> >>> Any ideas about what could be the ideal configuration for me? >>> Thanks. >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org