Re: Analyzing performance and memory consumption for boolean queries

2009-06-26 Thread Toke Eskildsen
On Wed, 2009-06-24 at 21:38 +0200, Nigel wrote: > It sounds like surely any swapping out of the JVM memory could cause big and > unpredictable performance drops. As I just mentioned in reply to Uwe, our > poor performance times don't always directly correlate with index updates, > but it may be th

Re: Analyzing performance and memory consumption for boolean queries

2009-06-25 Thread Nigel
On Wed, Jun 24, 2009 at 4:47 PM, Uwe Schindler wrote: > Have you tried out, if GC affects you? A first step would be to turn on GC > logging with -verbosegc -XX:+PrintGCDetails > > If you see some relation between query time and gc messages, you should try > to use a better parallelized GC and ch

Re: Analyzing performance and memory consumption for boolean queries

2009-06-24 Thread Michael McCandless
On Wed, Jun 24, 2009 at 3:38 PM, Nigel wrote: > Yes, we're indexing on a separate server, and rsyncing from index snapshots > there to the search servers.  Usually rsync has to copy just a few small > .cfs files, but every once in a while merging will product a big one.  I'm > going to try to limi

RE: Analyzing performance and memory consumption for boolean queries

2009-06-24 Thread Uwe Schindler
). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Nigel [mailto:nigelspl...@gmail.com] > Sent: Wednesday, June 24, 2009 8:54 PM > To: java-user@lucene.apache.org > Subject: Re: Analyzing perform

Re: Analyzing performance and memory consumption for boolean queries

2009-06-24 Thread Nigel
Hi Mike, Yes, we're indexing on a separate server, and rsyncing from index snapshots there to the search servers. Usually rsync has to copy just a few small .cfs files, but every once in a while merging will product a big one. I'm going to try to limit this by setting maxMergeMB, but of course t

Re: Analyzing performance and memory consumption for boolean queries

2009-06-24 Thread Nigel
Hi Uwe, Good points, thank you. The obvious place where GC really has to work hard is when index changes are rsync'd over and we have to open the new index and close the old one. Our slow performance times don't seem to be directly correlated with the index rotation, but maybe it just appears th

Re: Analyzing performance and memory consumption for boolean queries

2009-06-24 Thread Nigel
Thanks Otis -- I'll give that a try. I think this relates to the first question in my original message, which was what (if any) of the inverted index structure is explicitly cached by Lucene in the JVM. Clearly there's something, since a large JVM heap is required to avoid running out of memory,

Re: Analyzing performance and memory consumption for boolean queries

2009-06-24 Thread Nigel
Hi Ken, Thanks for your reply. I agree that your overall diagnosis (GC problems and/or swapping) sounds likely. To follow up on some the specific things you mentioned: 2. 250M/4 = 60M docs/index. The old rule of thumb was 10M docs/index as a > reasonable size. You might just need more hardware.

Re: Analyzing performance and memory consumption for boolean queries

2009-06-24 Thread Michael McCandless
Is it possible the occasional large merge is clearing out the IO cache (thus "unwarming" your searcher)? (Though since you're rsync'ing your updates in, it sounds like a separate machine is building the index). Or... linux will happily swap out a process's core in favor of IO cache (though I'd ex

Re: Analyzing performance and memory consumption for boolean queries

2009-06-24 Thread eks dev
another performance tip, waht helps "a lot" is collection sorting before you index. if you can somehow logically partition your index, you can improve locality of reference by sorting. What I mean by this: imagine index with following fields: zip, user_group, some text if typical query

Re: Analyzing performance and memory consumption for boolean queries

2009-06-24 Thread eks dev
I guess you set omitNorms() and omitTf() during indexing for all fields? If not, try this. It helps a lot good luck, eks - Original Message > From: Uwe Schindler > To: java-user@lucene.apache.org > Sent: Wednesday, 24 June, 2009 9:33:08 > Subject: RE: Analyzing perform

RE: Analyzing performance and memory consumption for boolean queries

2009-06-24 Thread Uwe Schindler
> 1. For search time to vary from < 1 second => 20 seconds, the only > two things I've seen are: > > * Serious JVM garbage collection problems. > * You're in Linux swap hell. > > We tracked similar issued down by creating a testbed that let us run > a set of real-world queries, such that we could

Re: Analyzing performance and memory consumption for boolean queries

2009-06-23 Thread Otis Gospodnetic
23, 2009 4:53:09 PM > Subject: Analyzing performance and memory consumption for boolean queries > > Our query performance is surprisingly inconsistent, and I'm trying to figure > out why. I've realized that I need to better understand what's going on > internally in L

Re: Analyzing performance and memory consumption for boolean queries

2009-06-23 Thread Ken Krugler
Hi Chris, Others on this list will be able to provide much better optimization suggestions, but based on my experience with some large indexes... 1. For search time to vary from < 1 second => 20 seconds, the only two things I've seen are: * Serious JVM garbage collection problems. * You're

Analyzing performance and memory consumption for boolean queries

2009-06-23 Thread Nigel
Our query performance is surprisingly inconsistent, and I'm trying to figure out why. I've realized that I need to better understand what's going on internally in Lucene when we're searching. I'd be grateful for any answers (including pointers to existing docs, if any). Our situation is this: We