;>
>> Did you completely re-index?
>>
>> If you did, then there is some other problem - can you share (more of)
>> your code?
>>
>> Do you know about Luke? It's an essential tool for Lucene index
>> debugging:
>>
>
u share (more of)
> your code?
>
> Do you know about Luke? It's an essential tool for Lucene index
> debugging:
>
>http://www.getopt.org/luke/
>
> Steve
>
> On 01/13/2010 at 8:34 PM, AlexElba wrote:
>>
>> Hello,
>>
>> I change
> Hello,
>
> I change filter to follow
> RangeFilter rangeFilter = new RangeFilter(
>"rank", NumberTools
> .longToString(rating), NumberTools
> .longToString(10), true, true);
Hello,
I change filter to follow
RangeFilter rangeFilter = new RangeFilter(
"rank", NumberTools
.longToString(rating), NumberTools
.longToString(10), true, true);
and change index to store rank th
Thanks Steve.
Mike for now I can not upgrade...
--
View this message in context:
http://old.nabble.com/RangeFilter-tp27148785p27151315.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To
left-pad the "rank" field values with
> zeroes: "03", "04", ..., "10", and then create a RangeFilter over "03" ..
> "10". You will of course need to left-zero-pad to at least the maximum
> character length of the largest rank.
E.g., you can left-pad the "rank" field values with zeroes:
"03", "04", ..., "10", and then create a RangeFilter over "03" .. "10". You
will of course need to left-zero-pad to at least the maximum character length
of the la
Hello,
I am currently using lucene 2.4 and have document with 3 fields
id
name
rank
and have query and filter when I am trying to use rang filter on rank I am
not getting any result back
RangeFilter rangeFilter = new RangeFilter("rank", "3", "10", true, true
Hello all,
What is the difference in using RangeFilter and ConstantScoreRangeQuery? Any
difference in performance?
I am using datetime field (MMDDhhmm), If i store the field with date
precision (MMDD), Will the range filter be faster?
Regards
Ganesh
Send instant messages to your
- TEST2 (using searcher.search) should not be affected by this patch
at all, set some of the results are shown to be twice as slow. Seems
like there may be a lot of measurement noise in these tests.
Although looking at RangeFilter quickly, I do see an issue that would
prevent the optimizations in 1596 from ki
I am sorry,
but after applying this patch, the performance on my tests are worse than
those on lucene-2.9-dev trunk.
TEST1: using *filter.getDocIdSet(reader)*;
*Test *results* (Num docs = 2,940,738) using lucene-core-2.9-dev trunk**
1 Original index (12 collections * 6 months = 72 indexes)*
OK, I think this will improve the situation:
https://issues.apache.org/jira/browse/LUCENE-1596
-Yonik
http://www.lucidimagination.com
On Fri, Apr 10, 2009 at 1:47 PM, Michael McCandless
wrote:
> We never fully explained it, but we have some ideas...
>
> It's only if you iterate each term, and d
en
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Saturday, April 11, 2009 6:42 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: RangeFilter perfor
gt; Subject: Re: RangeFilter performance problem using MultiReader
>
> OK, I scanned all the e-mails in this thread so I may be way off base, but
> has anyone yet asked the basic question of whether the granularity of the
> dates is really necessary ?
>
> Raf and Roberto:
>
> It
er-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Michael McCandless [mailto:luc...@mikemccandless.com]
> > Sent: Saturday, April 11, 2009 4:03 PM
> > To: java-user@lucene.apache.org
> > Su
...@thetaphi.de
> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Saturday, April 11, 2009 4:03 PM
> To: java-user@lucene.apache.org
> Subject: Re: RangeFilter performance problem using MultiReader
>
> Ahhh, OK, perhaps that expl
Ahhh, OK, perhaps that explains the sizable perf difference you're
seeing w/ optimized vs not. I'm curious to see the results of your
"merge each month into 1 index" test...
Mike
On Sat, Apr 11, 2009 at 9:21 AM, Roberto Franchini
wrote:
> On Sat, Apr 11, 2009 at 1:50 PM, Michael McCandless
> w
On Sat, Apr 11, 2009 at 1:50 PM, Michael McCandless
wrote:
> Hmm then I'm a bit baffled again.
>
> Because, each of your "by month" indexes presumably has a unique
> subset of terms for the "date_doc" field? Meaning, a given "by month"
> index will have all date_doc corresponding to that month, a
Hmm then I'm a bit baffled again.
Because, each of your "by month" indexes presumably has a unique
subset of terms for the "date_doc" field? Meaning, a given "by month"
index will have all date_doc corresponding to that month, and a
different "by month" index would presumably have no overlap in t
On Sat, Apr 11, 2009 at 11:48 AM, Michael McCandless
wrote:
> On Sat, Apr 11, 2009 at 5:27 AM, Raf wrote:
>
[cut]
>
> You have readers from 72 different directories, but is each directory
> an optimized or unoptimized index?
Hi,
I'm Raffaella's collegue, and I'm the "indexer" while she is the "s
On Sat, Apr 11, 2009 at 5:27 AM, Raf wrote:
> I have repeated my tests using a searcher and now the performance on 2.9 are
> very better than those on 2.4.1, especially when the filter extracts a lot
> of docs.
OK, phew!
> However the same search on the consolidated index is even faster
This i
on the production environment, so I think I will have to
consolidate indexes for now.
Thanks a lot for your help,
Raf
If you are interested, here you can find the new test code and a result
comparison between 2.4.1 and 2.9:
*RangeFilter searcher test*
@Test
public void testRangeFilterSearch
o: java-user@lucene.apache.org
> Subject: Re: RangeFilter performance problem using MultiReader
>
> Thanks Uwe,
> I had already read about TrieRangeFilter on this mailing list and I
> thought
> it could be useful to solve my problem.
> I think I will trie it for test purposes.
eMail: u...@thetaphi.de
> -Original Message-
> From: Raf [mailto:r.ventag...@gmail.com]
> Sent: Saturday, April 11, 2009 9:07 AM
> To: java-user@lucene.apache.org
> Subject: Re: RangeFilter performance problem using MultiReader
>
> Ok, here you can find some d
ons about your index structure. Now
> another idea, maybe this helps you to speed up your RangeFilter:
>
> What type of range do you want to query? From your index statistics, it
> looks like a numeric/date field from which you filter very large ranges. If
> the values are very fine-
No, it is a MultiReader that contains 72 (I am sorry, I wrote a wrong number
last time) "single" readers.
Raf
On Fri, Apr 10, 2009 at 9:14 PM, Mark Miller wrote:
> Raf wrote:
>
>>
>> We have more or less 3M documents in 24 indexes and we read all of them
>> using a MultiReader.
>>
>>
>
> Is this
(...) {
... ... ...
}
}
this.reader = new MultiReader(subReaders.toArray(new IndexReader[] {}));
(where *this.directories* is a List containing all my index
directories).
*RangeFilter test*
@Test
public void testRangeFilter() throws IOException, ParseException
You got a lot of answers and questions about your index structure. Now
another idea, maybe this helps you to speed up your RangeFilter:
What type of range do you want to query? From your index statistics, it
looks like a numeric/date field from which you filter very large ranges. If
the values
On Fri, Apr 10, 2009 at 3:06 PM, Mark Miller wrote:
> 24 segments is bound to be quite a bit slower than an optimized index for
> most things
I'd be curious just how true this really is (in general)... my guess
is the "long tail of tiny segments" gets into the OS's IO cache (as
long as the syste
On Fri, Apr 10, 2009 at 3:11 PM, Mark Miller wrote:
> Mark Miller wrote:
>>
>> Michael McCandless wrote:
>>>
>>> which is why I'm baffled that Raf didn't see a speedup on
>>> upgrading.
>>>
>>> Mike
>>>
>>
>> Another point is that he may not have such a nasty set of segments - Raf
>> says he has 2
On Fri, Apr 10, 2009 at 3:14 PM, Mark Miller wrote:
> Raf wrote:
>>
>> We have more or less 3M documents in 24 indexes and we read all of them
>> using a MultiReader.
>>
>
> Is this a multireader containing multireaders?
Let's hear Raf's answer, but I think likely "yes". But this shouldn't
be a
Raf wrote:
We have more or less 3M documents in 24 indexes and we read all of them
using a MultiReader.
Is this a multireader containing multireaders?
--
- Mark
http://www.lucidimagination.com
-
To unsubscribe, e-mail
Mark Miller wrote:
Michael McCandless wrote:
which is why I'm baffled that Raf didn't see a speedup on
upgrading.
Mike
Another point is that he may not have such a nasty set of segments -
Raf says he has 24 indexes, which sounds like he may not have the
logarithmic sizing you normally see
Michael McCandless wrote:
which is why I'm baffled that Raf didn't see a speedup on
upgrading.
Mike
Another point is that he may not have such a nasty set of segments - Raf
says he has 24 indexes, which sounds like he may not have the
logarithmic sizing you normally see. If you have somewh
he over-seeking problem. FieldCache does that, and
RangeFilter on 2.4 does that, but RangeFilter (or RangeQuery with
constant score mode) on 2.9 should not (they should do it per
segment), which is why I'm baffled that Raf didn't see a speedup on
upgrading.
Mike
Ah, right - anything ut
king problem. FieldCache does that, and
RangeFilter on 2.4 does that, but RangeFilter (or RangeQuery with
constant score mode) on 2.9 should not (they should do it per
segment), which is why I'm baffled that Raf didn't see a speedup on
upgrading.
Mike
When I did some profiling I saw that the slow down came from tons of
extra seeks (single segment vs multisegment). What was happening was,
the first couple segments would have thousands of terms for the field,
but as the segments logarithmically shrank in size, the number of terms
for the segme
evious ones.
> The big index is 7/8 times faster than multireader version.
Hmmm, interesting!
Can you provide more details about your tests? EG the code fragment
showing your query, the creation of the MultiReader, how you run the
search, etc.?
Is the field that you're applying the Range
On Fri, Apr 10, 2009 at 11:03 AM, Yonik Seeley
wrote:
> On Fri, Apr 10, 2009 at 10:48 AM, Michael McCandless
> wrote:
>> Unfortunately, in Lucene 2.4, any query that needs to enumerate Terms
>> (Prefix, Wildcard, Range, etc.) has poor performance on Multi*Readers.
>
> Do we know why this is, and
ve more or less 3M documents in 24 indexes and we read all of them
> > using a MultiReader.
> > If we do a search using only terms, there are no problems, but it if we
> add
> > to the same search terms a RangeFilter that extracts a large subset of
> the
> > documents (e.g. 500K
On Fri, Apr 10, 2009 at 10:48 AM, Michael McCandless
wrote:
> Unfortunately, in Lucene 2.4, any query that needs to enumerate Terms
> (Prefix, Wildcard, Range, etc.) has poor performance on Multi*Readers.
Do we know why this is, and if it's fixable (the MultiTermEnum, not
the higher level query o
using RangeFilters and we think there are
> some performance issues caused by MultiReader.
>
> We have more or less 3M documents in 24 indexes and we read all of them
> using a MultiReader.
> If we do a search using only terms, there are no problems, but it if we add
> to the same se
to the same search terms a RangeFilter that extracts a large subset of the
documents (e.g. 500K), it takes a lot of time to execute (about 15s).
In order to identify the problem, we have tried to consolidate the index: so
now we have the same 3M docs in a single 10GB index.
If we repeat the same
vivek sar wrote:
I've a field as NO_NORM, does it has to be untokenized to be able to
sort on it?
NO_NORMS is the same as UNTOKENIZED + omitNorms, so you can sort on that.
Antony
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
F
I've a field as NO_NORM, does it has to be untokenized to be able to
sort on it?
On Jan 21, 2008 12:47 PM, Antony Bowesman <[EMAIL PROTECTED]> wrote:
> vivek sar wrote:
> > I need to be able to sort on optime as well, thus need to store it .
>
> Lucene's default sorting does not need the field to
vivek sar wrote:
I need to be able to sort on optime as well, thus need to store it .
Lucene's default sorting does not need the field to be stored, only indexed as
untokenized.
Antony
-
To unsubscribe, e-mail: [EMAIL PRO
l Message
> > From: vivek sar <[EMAIL PROTECTED]>
> > To: java-user@lucene.apache.org
> > Sent: Saturday, January 19, 2008 8:06:25 PM
> > Subject: Using RangeFilter
> >
> > Hi,
> >
> > I have a requirement to filter out documents by date rang
y not just index them and not
> store them if index size is a concern?
>
> Otis
>
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> - Original Message
> From: vivek sar <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent:
g
Sent: Saturday, January 19, 2008 8:06:25 PM
Subject: Using RangeFilter
Hi,
I have a requirement to filter out documents by date range. I'm using
RangeFilter (in combination to FilteredQuery) to do this. I was under
the impression the filtering is done on documents, thus I'm just
storing the
Hi,
I have a requirement to filter out documents by date range. I'm using
RangeFilter (in combination to FilteredQuery) to do this. I was under
the impression the filtering is done on documents, thus I'm just
storing the date values, but not indexing them. As every new document
would
Thanks for clarifying this, Chris!
I agree with you that javadocs usual should doc all they do but often
times they skip few important things they do do.
Chris Hostetter wrote:
: Does anyone know if the RangeFilter is a cached filter? I could not
: tell from the api.
Generally speaking
: Does anyone know if the RangeFilter is a cached filter? I could not
: tell from the api.
Generally speaking classes only document what they do, not what they
*don't* do ... so if the javadocs don't say anything about caching, then
it doesn't have any caching.
more specificly
Hi All,
Does anyone know if the RangeFilter is a cached filter? I could not
tell from the api.
Thanks!
Jay
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
See
http://wiki.apache.org/jakarta-lucene/FilteringOptions
--- Anton Potehin <[EMAIL PROTECTED]> wrote:
> What faster RangeQuery or RangeFilter ?
>
>
___
Win a BlackBerry device from O2 with Yahoo!. E
What faster RangeQuery or RangeFilter ?
: I am trying to filter my search using RangeFilter class but i get
: BooleanQuery TooManyClauses exception.
You aren't useing a RangeFilter, you are using a RangeQuery ... they are
very different beasts. RangeQuery works fine for small ranges, or when
you want the term frequencies of the
: java-user@lucene.apache.org
Subject: "filtering" using RangeFilter class
Hi All,
I am trying to filter my search using RangeFilter class but i get
BooleanQuery TooManyClauses exception.
Exception in thread "main"
org.apache.lucene.search.BooleanQuer
Hi All,
I am trying to filter my search using RangeFilter class but i get
BooleanQuery TooManyClauses exception.
Exception in thread "main"
org.apache.lucene.search.BooleanQuery$TooManyClauses
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.ja
: I downloaded the source code of 1.4.3 but did not find the source of
: RangeFilter.
: I could not find it in the sandbox either?
:
: RangeFilter, where art thou?
RangeFilter was commited to the core, but after 1.4.3 was released...
http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/java
I downloaded the source code of 1.4.3 but did not find the source of
RangeFilter.
I could not find it in the sandbox either?
RangeFilter, where art thou?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e
60 matches
Mail list logo