Hmmm, something is wrong range queries over many terms should
definitely be faster.
There are some other oddities in your results...
- the "consolidated index" shows to be slower 295ms vs 602ms... but
patch 1596 doesn't touch that code path (a single segment index).
- TEST2 (using searcher.sear
I am sorry,
but after applying this patch, the performance on my tests are worse than
those on lucene-2.9-dev trunk.
TEST1: using *filter.getDocIdSet(reader)*;
*Test *results* (Num docs = 2,940,738) using lucene-core-2.9-dev trunk**
1 Original index (12 collections * 6 months = 72 indexes)*
OK, I think this will improve the situation:
https://issues.apache.org/jira/browse/LUCENE-1596
-Yonik
http://www.lucidimagination.com
On Fri, Apr 10, 2009 at 1:47 PM, Michael McCandless
wrote:
> We never fully explained it, but we have some ideas...
>
> It's only if you iterate each term, and d
en
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Saturday, April 11, 2009 6:42 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: RangeFilter perfor
http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> > > -Original Message-----
> > > From: Michael McCandless [mailto:luc...@mikemccandless.com]
> > > Sent: Saturday, April 11, 2009 4:03 PM
> > > To: java-user@lucene.apache.org
> > &
er-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Michael McCandless [mailto:luc...@mikemccandless.com]
> > Sent: Saturday, April 11, 2009 4:03 PM
> > To: java-user@lucene.apache.org
> > Su
...@thetaphi.de
> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Saturday, April 11, 2009 4:03 PM
> To: java-user@lucene.apache.org
> Subject: Re: RangeFilter performance problem using MultiReader
>
> Ahhh, OK, perhaps that expl
Ahhh, OK, perhaps that explains the sizable perf difference you're
seeing w/ optimized vs not. I'm curious to see the results of your
"merge each month into 1 index" test...
Mike
On Sat, Apr 11, 2009 at 9:21 AM, Roberto Franchini
wrote:
> On Sat, Apr 11, 2009 at 1:50 PM, Michael McCandless
> w
On Sat, Apr 11, 2009 at 1:50 PM, Michael McCandless
wrote:
> Hmm then I'm a bit baffled again.
>
> Because, each of your "by month" indexes presumably has a unique
> subset of terms for the "date_doc" field? Meaning, a given "by month"
> index will have all date_doc corresponding to that month, a
Hmm then I'm a bit baffled again.
Because, each of your "by month" indexes presumably has a unique
subset of terms for the "date_doc" field? Meaning, a given "by month"
index will have all date_doc corresponding to that month, and a
different "by month" index would presumably have no overlap in t
On Sat, Apr 11, 2009 at 11:48 AM, Michael McCandless
wrote:
> On Sat, Apr 11, 2009 at 5:27 AM, Raf wrote:
>
[cut]
>
> You have readers from 72 different directories, but is each directory
> an optimized or unoptimized index?
Hi,
I'm Raffaella's collegue, and I'm the "indexer" while she is the "s
On Sat, Apr 11, 2009 at 5:27 AM, Raf wrote:
> I have repeated my tests using a searcher and now the performance on 2.9 are
> very better than those on 2.4.1, especially when the filter extracts a lot
> of docs.
OK, phew!
> However the same search on the consolidated index is even faster
This i
dex. This is not
> faster in 2.9.
>
> To compare speed, please use real search code (Searcher.search())!
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Raf [mailto:r.ventag...@gmail.com]
> > Sent: Saturday, April 11, 2009 9:07 AM
> > To: java-user@lucene.apache.org
> > Subject: Re: RangeFilter performance problem using MultiReader
> >
>
o: java-user@lucene.apache.org
> Subject: Re: RangeFilter performance problem using MultiReader
>
> Thanks Uwe,
> I had already read about TrieRangeFilter on this mailing list and I
> thought
> it could be useful to solve my problem.
> I think I will trie it for test purposes.
eMail: u...@thetaphi.de
> -Original Message-
> From: Raf [mailto:r.ventag...@gmail.com]
> Sent: Saturday, April 11, 2009 9:07 AM
> To: java-user@lucene.apache.org
> Subject: Re: RangeFilter performance problem using MultiReader
>
> Ok, here you can find some d
Thanks Uwe,
I had already read about TrieRangeFilter on this mailing list and I thought
it could be useful to solve my problem.
I think I will trie it for test purposes.
Unfortunately, I have now to solve the problem in a production system and I
would like to avoid using a yet unreleased version.
No, it is a MultiReader that contains 72 (I am sorry, I wrote a wrong number
last time) "single" readers.
Raf
On Fri, Apr 10, 2009 at 9:14 PM, Mark Miller wrote:
> Raf wrote:
>
>>
>> We have more or less 3M documents in 24 indexes and we read all of them
>> using a MultiReader.
>>
>>
>
> Is this
Ok, here you can find some details about my tests:
*MultiReader creation*
IndexReader subReader;
List subReaders = new ArrayList();
for (Directory dir : this.directories) {
try {
subReader = IndexReader.open(dir, true);
subReaders.add(subReader);
} catch (...) {
You got a lot of answers and questions about your index structure. Now
another idea, maybe this helps you to speed up your RangeFilter:
What type of range do you want to query? From your index statistics, it
looks like a numeric/date field from which you filter very large ranges. If
the values are
On Fri, Apr 10, 2009 at 3:06 PM, Mark Miller wrote:
> 24 segments is bound to be quite a bit slower than an optimized index for
> most things
I'd be curious just how true this really is (in general)... my guess
is the "long tail of tiny segments" gets into the OS's IO cache (as
long as the syste
On Fri, Apr 10, 2009 at 3:11 PM, Mark Miller wrote:
> Mark Miller wrote:
>>
>> Michael McCandless wrote:
>>>
>>> which is why I'm baffled that Raf didn't see a speedup on
>>> upgrading.
>>>
>>> Mike
>>>
>>
>> Another point is that he may not have such a nasty set of segments - Raf
>> says he has 2
On Fri, Apr 10, 2009 at 3:14 PM, Mark Miller wrote:
> Raf wrote:
>>
>> We have more or less 3M documents in 24 indexes and we read all of them
>> using a MultiReader.
>>
>
> Is this a multireader containing multireaders?
Let's hear Raf's answer, but I think likely "yes". But this shouldn't
be a
Raf wrote:
We have more or less 3M documents in 24 indexes and we read all of them
using a MultiReader.
Is this a multireader containing multireaders?
--
- Mark
http://www.lucidimagination.com
-
To unsubscribe, e-mail
Mark Miller wrote:
Michael McCandless wrote:
which is why I'm baffled that Raf didn't see a speedup on
upgrading.
Mike
Another point is that he may not have such a nasty set of segments -
Raf says he has 24 indexes, which sounds like he may not have the
logarithmic sizing you normally see
Michael McCandless wrote:
which is why I'm baffled that Raf didn't see a speedup on
upgrading.
Mike
Another point is that he may not have such a nasty set of segments - Raf
says he has 24 indexes, which sounds like he may not have the
logarithmic sizing you normally see. If you have somewh
Michael McCandless wrote:
On Fri, Apr 10, 2009 at 2:32 PM, Mark Miller wrote:
I had thought we would also see the advantage with multi-term queries - you
rewrite against each segment and avoid extra seeks (though not nearly as
many as when enumerating every term). As Mike pointed out to me
On Fri, Apr 10, 2009 at 2:32 PM, Mark Miller wrote:
> I had thought we would also see the advantage with multi-term queries - you
> rewrite against each segment and avoid extra seeks (though not nearly as
> many as when enumerating every term). As Mike pointed out to me back when
> though : we st
When I did some profiling I saw that the slow down came from tons of
extra seeks (single segment vs multisegment). What was happening was,
the first couple segments would have thousands of terms for the field,
but as the segments logarithmically shrank in size, the number of terms
for the segme
On Fri, Apr 10, 2009 at 1:20 PM, Raf wrote:
> Hi Mike,
> thank you for your answer.
>
> I have downloaded lucene-core-2.9-dev and I have executed my tests (both on
> multireader and on consolidated index) using this new version, but the
> performance are very similar to the previous ones.
> The bi
On Fri, Apr 10, 2009 at 11:03 AM, Yonik Seeley
wrote:
> On Fri, Apr 10, 2009 at 10:48 AM, Michael McCandless
> wrote:
>> Unfortunately, in Lucene 2.4, any query that needs to enumerate Terms
>> (Prefix, Wildcard, Range, etc.) has poor performance on Multi*Readers.
>
> Do we know why this is, and
Hi Mike,
thank you for your answer.
I have downloaded lucene-core-2.9-dev and I have executed my tests (both on
multireader and on consolidated index) using this new version, but the
performance are very similar to the previous ones.
The big index is 7/8 times faster than multireader version.
Raf
On Fri, Apr 10, 2009 at 10:48 AM, Michael McCandless
wrote:
> Unfortunately, in Lucene 2.4, any query that needs to enumerate Terms
> (Prefix, Wildcard, Range, etc.) has poor performance on Multi*Readers.
Do we know why this is, and if it's fixable (the MultiTermEnum, not
the higher level query o
Unfortunately, in Lucene 2.4, any query that needs to enumerate Terms
(Prefix, Wildcard, Range, etc.) has poor performance on Multi*Readers.
I think the only workaround is to merge your indexes down to a single
index.
But, Lucene trunk (not yet released) has fixed this, so that searching
through
33 matches
Mail list logo