Re: Lucene performance bottlenecks

2005-12-12 Thread Chris Hostetter
: Oh, BTW: I just found the DisjunctionMaxQuery class, recently added it : seems. Do you think this query structure could benefit from using it : instead of the BooleanQuery? DisjunctionMaxQuery kicks ass (in my opinion), and It certainly seems like (from your query structure) it's something you

Re: Lucene performance bottlenecks

2005-12-12 Thread Andrzej Bialecki
Paul Elschot wrote: There is one indexing parameter that might help performance for BooleanScorer2, it is the skip interval in Lucene's TermInfosWriter. The current value is 16, and there was a question about it on 16 Oct 2005 on java-dev with title "skipInterval". I don't know how the value of

Re: Lucene performance bottlenecks

2005-12-11 Thread Paul Elschot
On Wednesday 07 December 2005 10:51, Andrzej Bialecki wrote: > Paul Elschot wrote: > >On Saturday 03 December 2005 14:09, Andrzej Bialecki wrote: > >>Paul Elschot wrote: > >> ... > >>>This is one of the cases in which BooleanScorer2 can be faster > >>>than the 1.4 BooleanScorer because the 1.4 Boo

RE: Lucene performance bottlenecks

2005-12-08 Thread Dalton, Jeffery
Andrzej, I think you did a great job elucidating my thoughts as well. I heartily concur with everything you said. Andrzej Bialecki Wrote: > Hmm... Please define what "adequate" means. :-) IMHO, > "adequate" is when for any query the response time is well > below 1 second. Otherwise the serv

Re: Lucene performance bottlenecks

2005-12-08 Thread Andrzej Bialecki
(Moving the discussion to nutch-dev, please drop the cc: when responding) Doug Cutting wrote: Andrzej Bialecki wrote: It's nice to have these couple percent... however, it doesn't solve the main problem; I need 50 or more percent increase... :-) and I suspect this can be achieved only by som

Re: Lucene performance bottlenecks

2005-12-07 Thread Doug Cutting
Andrzej Bialecki wrote: It's nice to have these couple percent... however, it doesn't solve the main problem; I need 50 or more percent increase... :-) and I suspect this can be achieved only by some radical changes in the way Nutch uses Lucene. It seems the default query structure is too compl

Re: Lucene performance bottlenecks

2005-12-07 Thread Andrzej Bialecki
Yonik Seeley wrote: if (b>0) return b; Doing an 'and' of two bytes and checking if the result is 0 probably requires masking operations on >8 bit processors... Sometimes you can get a peek into how a JVM would optimize things by looking at the asm output of the code from a C compiler. Bot

Re: Lucene performance bottlenecks

2005-12-07 Thread Doug Cutting
Paul Elschot wrote: Querying the host field like this in a web page index can be dangerous business. For example when term1 is "wikipedia" and term2 is "org", the query will match at least all pages from wikipedia.org. Note that if you search for wikipedia.org in Nutch this is interpreted as a

Re: Lucene performance bottlenecks

2005-12-07 Thread Yonik Seeley
> if (b>0) return b; > Doing an 'and' of two bytes and checking if the result is 0 probably > requires masking operations on >8 bit processors... Sometimes you can get a peek into how a JVM would optimize things by looking at the asm output of the code from a C compiler. Both (b>=0) and ((b&0x80)!

Re: Lucene performance bottlenecks

2005-12-07 Thread Yonik Seeley
On 12/7/05, Vanlerberghe, Luc <[EMAIL PROTECTED]> wrote: > Since 'byte' is signed in Java, can't the first test be simply written > as > if (b>0) return b; > Doing an 'and' of two bytes and checking if the result is 0 probably > requires masking operations on >8 bit processors... Yep, that was my

RE: Lucene performance bottlenecks

2005-12-07 Thread Vanlerberghe, Luc
all operators use int's... Luc -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: woensdag 7 december 2005 16:11 To: java-user@lucene.apache.org Subject: Re: Lucene performance bottlenecks I checked out readVInt() to see if I could optimize it any... For a random

Re: Lucene performance bottlenecks

2005-12-07 Thread Yonik Seeley
I checked out readVInt() to see if I could optimize it any... For a random distribution of integers <200 I was able to speed it up a little bit, but nothing to write home about: old newpercent Java14-client : 13547 12468 8% Java14-server: 6047 5266 14% Java1

Re: Lucene performance bottlenecks

2005-12-07 Thread Andrzej Bialecki
Paul Elschot wrote: On Saturday 03 December 2005 14:09, Andrzej Bialecki wrote: Paul Elschot wrote: In somewhat more readable layout: +(url:term1^4.0 anchor:term1^2.0 content:term1 title:term1^1.5 host:term1^2.0) +(url:term2^4.0 anchor:term2^2.0 content:term2 title:term2^1.5 host:

Re: Lucene performance bottlenecks

2005-12-03 Thread Paul Elschot
On Saturday 03 December 2005 14:09, Andrzej Bialecki wrote: > Paul Elschot wrote: > > >In somewhat more readable layout: > > > >+(url:term1^4.0 anchor:term1^2.0 content:term1 > > title:term1^1.5 host:term1^2.0) > >+(url:term2^4.0 anchor:term2^2.0 content:term2 > > title:term2^1.5 host:term2^

Re: Lucene performance bottlenecks

2005-12-03 Thread Andrzej Bialecki
Paul Elschot wrote: In somewhat more readable layout: +(url:term1^4.0 anchor:term1^2.0 content:term1 title:term1^1.5 host:term1^2.0) +(url:term2^4.0 anchor:term2^2.0 content:term2 title:term2^1.5 host:term2^2.0) url:"term1 term2"~2147483647^4.0 anchor:"term1 term2"~4^2.0 content:"term1 t

Re: Lucene performance bottlenecks

2005-12-03 Thread Andrzej Bialecki
Doug Cutting wrote: Andrzej Bialecki wrote: For a simple TermQuery, if the DF(term) is above 10%, the response time from IndexSearcher.search() is around 400ms (repeatable, after warm-up). For such complex phrase queries the response time is around 1 sec or more (again, after warm-up). Ar

Re: Lucene performance bottlenecks

2005-12-02 Thread Doug Cutting
Andrzej Bialecki wrote: For a simple TermQuery, if the DF(term) is above 10%, the response time from IndexSearcher.search() is around 400ms (repeatable, after warm-up). For such complex phrase queries the response time is around 1 sec or more (again, after warm-up). Are you specifying -server

Re: Lucene performance bottlenecks

2005-12-02 Thread Paul Elschot
Andrzej, On Friday 02 December 2005 12:55, Andrzej Bialecki wrote: > Hi, > > I'm doing some performance profiling of a Nutch installation, working > with relatively large individual indexes (10 mln docs), and I'm puzzled > with the results. > > Here's the listing of the index: > -rw-r--r-- 1

Lucene performance bottlenecks

2005-12-02 Thread Andrzej Bialecki
Hi, I'm doing some performance profiling of a Nutch installation, working with relatively large individual indexes (10 mln docs), and I'm puzzled with the results. Here's the listing of the index: -rw-r--r-- 1 andrzej andrzej 9803100 Dec 2 05:24 _0.f0 -rw-r--r-- 1 andrzej andrzej 9