: Oh, BTW: I just found the DisjunctionMaxQuery class, recently added it
: seems. Do you think this query structure could benefit from using it
: instead of the BooleanQuery?
DisjunctionMaxQuery kicks ass (in my opinion), and It certainly seems like
(from your query structure) it's something you
Paul Elschot wrote:
There is one indexing parameter that might help performance
for BooleanScorer2, it is the skip interval in Lucene's TermInfosWriter.
The current value is 16, and there was a question about it
on 16 Oct 2005 on java-dev with title "skipInterval".
I don't know how the value of
On Wednesday 07 December 2005 10:51, Andrzej Bialecki wrote:
> Paul Elschot wrote:
> >On Saturday 03 December 2005 14:09, Andrzej Bialecki wrote:
> >>Paul Elschot wrote:
> >>
...
> >>>This is one of the cases in which BooleanScorer2 can be faster
> >>>than the 1.4 BooleanScorer because the 1.4 Boo
Andrzej, I think you did a great job elucidating my thoughts as well. I
heartily concur with everything you said.
Andrzej Bialecki Wrote:
> Hmm... Please define what "adequate" means. :-) IMHO,
> "adequate" is when for any query the response time is well
> below 1 second. Otherwise the serv
(Moving the discussion to nutch-dev, please drop the cc: when responding)
Doug Cutting wrote:
Andrzej Bialecki wrote:
It's nice to have these couple percent... however, it doesn't solve
the main problem; I need 50 or more percent increase... :-) and I
suspect this can be achieved only by som
Andrzej Bialecki wrote:
It's nice to have these couple percent... however, it doesn't solve the
main problem; I need 50 or more percent increase... :-) and I suspect
this can be achieved only by some radical changes in the way Nutch uses
Lucene. It seems the default query structure is too compl
Yonik Seeley wrote:
if (b>0) return b;
Doing an 'and' of two bytes and checking if the result is 0 probably
requires masking operations on >8 bit processors...
Sometimes you can get a peek into how a JVM would optimize things by
looking at the asm output of the code from a C compiler.
Bot
Paul Elschot wrote:
Querying the host field like this in a web page index can be dangerous
business. For example when term1 is "wikipedia" and term2 is "org",
the query will match at least all pages from wikipedia.org.
Note that if you search for wikipedia.org in Nutch this is interpreted
as a
> if (b>0) return b;
> Doing an 'and' of two bytes and checking if the result is 0 probably
> requires masking operations on >8 bit processors...
Sometimes you can get a peek into how a JVM would optimize things by
looking at the asm output of the code from a C compiler.
Both (b>=0) and ((b&0x80)!
On 12/7/05, Vanlerberghe, Luc <[EMAIL PROTECTED]> wrote:
> Since 'byte' is signed in Java, can't the first test be simply written
> as
> if (b>0) return b;
> Doing an 'and' of two bytes and checking if the result is 0 probably
> requires masking operations on >8 bit processors...
Yep, that was my
all operators use int's...
Luc
-Original Message-
From: Yonik Seeley [mailto:[EMAIL PROTECTED]
Sent: woensdag 7 december 2005 16:11
To: java-user@lucene.apache.org
Subject: Re: Lucene performance bottlenecks
I checked out readVInt() to see if I could optimize it any...
For a random
I checked out readVInt() to see if I could optimize it any...
For a random distribution of integers <200 I was able to speed it up a
little bit, but nothing to write home about:
old newpercent
Java14-client : 13547 12468 8%
Java14-server: 6047 5266 14%
Java1
Paul Elschot wrote:
On Saturday 03 December 2005 14:09, Andrzej Bialecki wrote:
Paul Elschot wrote:
In somewhat more readable layout:
+(url:term1^4.0 anchor:term1^2.0 content:term1
title:term1^1.5 host:term1^2.0)
+(url:term2^4.0 anchor:term2^2.0 content:term2
title:term2^1.5 host:
On Saturday 03 December 2005 14:09, Andrzej Bialecki wrote:
> Paul Elschot wrote:
>
> >In somewhat more readable layout:
> >
> >+(url:term1^4.0 anchor:term1^2.0 content:term1
> > title:term1^1.5 host:term1^2.0)
> >+(url:term2^4.0 anchor:term2^2.0 content:term2
> > title:term2^1.5 host:term2^
Paul Elschot wrote:
In somewhat more readable layout:
+(url:term1^4.0 anchor:term1^2.0 content:term1
title:term1^1.5 host:term1^2.0)
+(url:term2^4.0 anchor:term2^2.0 content:term2
title:term2^1.5 host:term2^2.0)
url:"term1 term2"~2147483647^4.0
anchor:"term1 term2"~4^2.0
content:"term1 t
Doug Cutting wrote:
Andrzej Bialecki wrote:
For a simple TermQuery, if the DF(term) is above 10%, the response
time from IndexSearcher.search() is around 400ms (repeatable, after
warm-up). For such complex phrase queries the response time is around
1 sec or more (again, after warm-up).
Ar
Andrzej Bialecki wrote:
For a simple TermQuery, if the DF(term) is above 10%, the response time
from IndexSearcher.search() is around 400ms (repeatable, after warm-up).
For such complex phrase queries the response time is around 1 sec or
more (again, after warm-up).
Are you specifying -server
Andrzej,
On Friday 02 December 2005 12:55, Andrzej Bialecki wrote:
> Hi,
>
> I'm doing some performance profiling of a Nutch installation, working
> with relatively large individual indexes (10 mln docs), and I'm puzzled
> with the results.
>
> Here's the listing of the index:
> -rw-r--r-- 1
Hi,
I'm doing some performance profiling of a Nutch installation, working
with relatively large individual indexes (10 mln docs), and I'm puzzled
with the results.
Here's the listing of the index:
-rw-r--r-- 1 andrzej andrzej 9803100 Dec 2 05:24 _0.f0
-rw-r--r-- 1 andrzej andrzej 9
19 matches
Mail list logo