On Wed, 2009-06-24 at 21:38 +0200, Nigel wrote:
> It sounds like surely any swapping out of the JVM memory could cause big and
> unpredictable performance drops. As I just mentioned in reply to Uwe, our
> poor performance times don't always directly correlate with index updates,
> but it may be th
On Wed, Jun 24, 2009 at 4:47 PM, Uwe Schindler wrote:
> Have you tried out, if GC affects you? A first step would be to turn on GC
> logging with -verbosegc -XX:+PrintGCDetails
>
> If you see some relation between query time and gc messages, you should try
> to use a better parallelized GC and ch
On Wed, Jun 24, 2009 at 3:38 PM, Nigel wrote:
> Yes, we're indexing on a separate server, and rsyncing from index snapshots
> there to the search servers. Usually rsync has to copy just a few small
> .cfs files, but every once in a while merging will product a big one. I'm
> going to try to limi
).
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Nigel [mailto:nigelspl...@gmail.com]
> Sent: Wednesday, June 24, 2009 8:54 PM
> To: java-user@lucene.apache.org
> Subject: Re: Analyzing perform
Hi Mike,
Yes, we're indexing on a separate server, and rsyncing from index snapshots
there to the search servers. Usually rsync has to copy just a few small
.cfs files, but every once in a while merging will product a big one. I'm
going to try to limit this by setting maxMergeMB, but of course t
Hi Uwe,
Good points, thank you. The obvious place where GC really has to work hard
is when index changes are rsync'd over and we have to open the new index and
close the old one. Our slow performance times don't seem to be directly
correlated with the index rotation, but maybe it just appears th
Thanks Otis -- I'll give that a try. I think this relates to the first
question in my original message, which was what (if any) of the inverted
index structure is explicitly cached by Lucene in the JVM. Clearly there's
something, since a large JVM heap is required to avoid running out of
memory,
Hi Ken,
Thanks for your reply. I agree that your overall diagnosis (GC problems
and/or swapping) sounds likely. To follow up on some the specific things
you mentioned:
2. 250M/4 = 60M docs/index. The old rule of thumb was 10M docs/index as a
> reasonable size. You might just need more hardware.
Is it possible the occasional large merge is clearing out the IO cache
(thus "unwarming" your searcher)? (Though since you're rsync'ing your
updates in, it sounds like a separate machine is building the index).
Or... linux will happily swap out a process's core in favor of IO
cache (though I'd ex
another performance tip, waht helps "a lot" is collection sorting before you
index.
if you can somehow logically partition your index, you can improve locality of
reference by sorting.
What I mean by this:
imagine index with following fields: zip, user_group, some text
if typical query
I guess you set omitNorms() and omitTf()
during indexing for all fields? If not, try this. It helps a lot
good luck,
eks
- Original Message
> From: Uwe Schindler
> To: java-user@lucene.apache.org
> Sent: Wednesday, 24 June, 2009 9:33:08
> Subject: RE: Analyzing perform
> 1. For search time to vary from < 1 second => 20 seconds, the only
> two things I've seen are:
>
> * Serious JVM garbage collection problems.
> * You're in Linux swap hell.
>
> We tracked similar issued down by creating a testbed that let us run
> a set of real-world queries, such that we could
23, 2009 4:53:09 PM
> Subject: Analyzing performance and memory consumption for boolean queries
>
> Our query performance is surprisingly inconsistent, and I'm trying to figure
> out why. I've realized that I need to better understand what's going on
> internally in L
Hi Chris,
Others on this list will be able to provide much better optimization
suggestions, but based on my experience with some large indexes...
1. For search time to vary from < 1 second => 20 seconds, the only
two things I've seen are:
* Serious JVM garbage collection problems.
* You're
Our query performance is surprisingly inconsistent, and I'm trying to figure
out why. I've realized that I need to better understand what's going on
internally in Lucene when we're searching. I'd be grateful for any answers
(including pointers to existing docs, if any).
Our situation is this: We
15 matches
Mail list logo