Re: RangeFilter

2010-01-15 Thread Ian Lea
;> >> Did you completely re-index? >> >> If you did, then there is some other problem - can you share (more of) >> your code? >> >> Do you know about Luke?  It's an essential tool for Lucene index >> debugging: >> >

RE: RangeFilter

2010-01-14 Thread AlexElba
u share (more of) > your code? > > Do you know about Luke? It's an essential tool for Lucene index > debugging: > >http://www.getopt.org/luke/ > > Steve > > On 01/13/2010 at 8:34 PM, AlexElba wrote: >> >> Hello, >> >> I change

RE: RangeFilter

2010-01-13 Thread Steven A Rowe
> Hello, > > I change filter to follow > RangeFilter rangeFilter = new RangeFilter( >"rank", NumberTools > .longToString(rating), NumberTools > .longToString(10), true, true);

Re: RangeFilter

2010-01-13 Thread AlexElba
Hello, I change filter to follow RangeFilter rangeFilter = new RangeFilter( "rank", NumberTools .longToString(rating), NumberTools .longToString(10), true, true); and change index to store rank th

Re: RangeFilter

2010-01-13 Thread AlexElba
Thanks Steve. Mike for now I can not upgrade... -- View this message in context: http://old.nabble.com/RangeFilter-tp27148785p27151315.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To

Re: RangeFilter

2010-01-13 Thread Michael McCandless
left-pad the "rank" field values with > zeroes: "03", "04", ..., "10", and then create a RangeFilter over "03" .. > "10".  You will of course need to left-zero-pad to at least the maximum > character length of the largest rank.

RE: RangeFilter

2010-01-13 Thread Steven A Rowe
E.g., you can left-pad the "rank" field values with zeroes: "03", "04", ..., "10", and then create a RangeFilter over "03" .. "10". You will of course need to left-zero-pad to at least the maximum character length of the la

RangeFilter

2010-01-13 Thread AlexElba
Hello, I am currently using lucene 2.4 and have document with 3 fields id name rank and have query and filter when I am trying to use rang filter on rank I am not getting any result back RangeFilter rangeFilter = new RangeFilter("rank", "3", "10", true, true

RangeFilter and ConstantScoreRangeQuery

2009-07-20 Thread Ganesh
Hello all, What is the difference in using RangeFilter and ConstantScoreRangeQuery? Any difference in performance? I am using datetime field (MMDDhhmm), If i store the field with date precision (MMDD), Will the range filter be faster? Regards Ganesh Send instant messages to your

Re: RangeFilter performance problem using MultiReader

2009-04-12 Thread Yonik Seeley
- TEST2 (using searcher.search) should not be affected by this patch at all, set some of the results are shown to be twice as slow. Seems like there may be a lot of measurement noise in these tests. Although looking at RangeFilter quickly, I do see an issue that would prevent the optimizations in 1596 from ki

Re: RangeFilter performance problem using MultiReader

2009-04-12 Thread Raf
I am sorry, but after applying this patch, the performance on my tests are worse than those on lucene-2.9-dev trunk. TEST1: using *filter.getDocIdSet(reader)*; *Test *results* (Num docs = 2,940,738) using lucene-core-2.9-dev trunk** 1 Original index (12 collections * 6 months = 72 indexes)*

Re: RangeFilter performance problem using MultiReader

2009-04-11 Thread Yonik Seeley
OK, I think this will improve the situation: https://issues.apache.org/jira/browse/LUCENE-1596 -Yonik http://www.lucidimagination.com On Fri, Apr 10, 2009 at 1:47 PM, Michael McCandless wrote: > We never fully explained it, but we have some ideas... > > It's only if you iterate each term, and d

Re: RangeFilter performance problem using MultiReader

2009-04-11 Thread Erick Erickson
en > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Erick Erickson [mailto:erickerick...@gmail.com] > > Sent: Saturday, April 11, 2009 6:42 PM > > To: java-user@lucene.apache.org > > Subject: Re: RangeFilter perfor

RE: RangeFilter performance problem using MultiReader

2009-04-11 Thread Uwe Schindler
gt; Subject: Re: RangeFilter performance problem using MultiReader > > OK, I scanned all the e-mails in this thread so I may be way off base, but > has anyone yet asked the basic question of whether the granularity of the > dates is really necessary ? > > Raf and Roberto: > > It

Re: RangeFilter performance problem using MultiReader

2009-04-11 Thread Erick Erickson
er-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Michael McCandless [mailto:luc...@mikemccandless.com] > > Sent: Saturday, April 11, 2009 4:03 PM > > To: java-user@lucene.apache.org > > Su

RE: RangeFilter performance problem using MultiReader

2009-04-11 Thread Uwe Schindler
...@thetaphi.de > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Saturday, April 11, 2009 4:03 PM > To: java-user@lucene.apache.org > Subject: Re: RangeFilter performance problem using MultiReader > > Ahhh, OK, perhaps that expl

Re: RangeFilter performance problem using MultiReader

2009-04-11 Thread Michael McCandless
Ahhh, OK, perhaps that explains the sizable perf difference you're seeing w/ optimized vs not. I'm curious to see the results of your "merge each month into 1 index" test... Mike On Sat, Apr 11, 2009 at 9:21 AM, Roberto Franchini wrote: > On Sat, Apr 11, 2009 at 1:50 PM, Michael McCandless > w

Re: RangeFilter performance problem using MultiReader

2009-04-11 Thread Roberto Franchini
On Sat, Apr 11, 2009 at 1:50 PM, Michael McCandless wrote: > Hmm then I'm a bit baffled again. > > Because, each of your "by month" indexes presumably has a unique > subset of terms for the "date_doc" field?  Meaning, a given "by month" > index will have all date_doc corresponding to that month, a

Re: RangeFilter performance problem using MultiReader

2009-04-11 Thread Michael McCandless
Hmm then I'm a bit baffled again. Because, each of your "by month" indexes presumably has a unique subset of terms for the "date_doc" field? Meaning, a given "by month" index will have all date_doc corresponding to that month, and a different "by month" index would presumably have no overlap in t

Re: RangeFilter performance problem using MultiReader

2009-04-11 Thread Roberto Franchini
On Sat, Apr 11, 2009 at 11:48 AM, Michael McCandless wrote: > On Sat, Apr 11, 2009 at 5:27 AM, Raf wrote: > [cut] > > You have readers from 72 different directories, but is each directory > an optimized or unoptimized index? Hi, I'm Raffaella's collegue, and I'm the "indexer" while she is the "s

Re: RangeFilter performance problem using MultiReader

2009-04-11 Thread Michael McCandless
On Sat, Apr 11, 2009 at 5:27 AM, Raf wrote: > I have repeated my tests using a searcher and now the performance on 2.9 are > very better than those on 2.4.1, especially when the filter extracts a lot > of docs. OK, phew! > However the same search on the consolidated index is even faster This i

Re: RangeFilter performance problem using MultiReader

2009-04-11 Thread Raf
on the production environment, so I think I will have to consolidate indexes for now. Thanks a lot for your help, Raf If you are interested, here you can find the new test code and a result comparison between 2.4.1 and 2.9: *RangeFilter searcher test* @Test public void testRangeFilterSearch

RE: RangeFilter performance problem using MultiReader

2009-04-11 Thread Uwe Schindler
o: java-user@lucene.apache.org > Subject: Re: RangeFilter performance problem using MultiReader > > Thanks Uwe, > I had already read about TrieRangeFilter on this mailing list and I > thought > it could be useful to solve my problem. > I think I will trie it for test purposes.

RE: RangeFilter performance problem using MultiReader

2009-04-11 Thread Uwe Schindler
eMail: u...@thetaphi.de > -Original Message- > From: Raf [mailto:r.ventag...@gmail.com] > Sent: Saturday, April 11, 2009 9:07 AM > To: java-user@lucene.apache.org > Subject: Re: RangeFilter performance problem using MultiReader > > Ok, here you can find some d

Re: RangeFilter performance problem using MultiReader

2009-04-11 Thread Raf
ons about your index structure. Now > another idea, maybe this helps you to speed up your RangeFilter: > > What type of range do you want to query? From your index statistics, it > looks like a numeric/date field from which you filter very large ranges. If > the values are very fine-

Re: RangeFilter performance problem using MultiReader

2009-04-11 Thread Raf
No, it is a MultiReader that contains 72 (I am sorry, I wrote a wrong number last time) "single" readers. Raf On Fri, Apr 10, 2009 at 9:14 PM, Mark Miller wrote: > Raf wrote: > >> >> We have more or less 3M documents in 24 indexes and we read all of them >> using a MultiReader. >> >> > > Is this

Re: RangeFilter performance problem using MultiReader

2009-04-11 Thread Raf
(...) { ... ... ... } } this.reader = new MultiReader(subReaders.toArray(new IndexReader[] {})); (where *this.directories* is a List containing all my index directories). *RangeFilter test* @Test public void testRangeFilter() throws IOException, ParseException

RE: RangeFilter performance problem using MultiReader

2009-04-10 Thread Uwe Schindler
You got a lot of answers and questions about your index structure. Now another idea, maybe this helps you to speed up your RangeFilter: What type of range do you want to query? From your index statistics, it looks like a numeric/date field from which you filter very large ranges. If the values

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Michael McCandless
On Fri, Apr 10, 2009 at 3:06 PM, Mark Miller wrote: > 24 segments is bound to be quite a bit slower than an optimized index for > most things I'd be curious just how true this really is (in general)... my guess is the "long tail of tiny segments" gets into the OS's IO cache (as long as the syste

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Michael McCandless
On Fri, Apr 10, 2009 at 3:11 PM, Mark Miller wrote: > Mark Miller wrote: >> >> Michael McCandless wrote: >>> >>> which is why I'm baffled that Raf didn't see a speedup on >>> upgrading. >>> >>> Mike >>> >> >> Another point is that he may not have such a nasty set of segments - Raf >> says he has 2

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Michael McCandless
On Fri, Apr 10, 2009 at 3:14 PM, Mark Miller wrote: > Raf wrote: >> >> We have more or less 3M documents in 24 indexes and we read all of them >> using a MultiReader. >> > > Is this a multireader containing multireaders? Let's hear Raf's answer, but I think likely "yes". But this shouldn't be a

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Mark Miller
Raf wrote: We have more or less 3M documents in 24 indexes and we read all of them using a MultiReader. Is this a multireader containing multireaders? -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Mark Miller
Mark Miller wrote: Michael McCandless wrote: which is why I'm baffled that Raf didn't see a speedup on upgrading. Mike Another point is that he may not have such a nasty set of segments - Raf says he has 24 indexes, which sounds like he may not have the logarithmic sizing you normally see

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Mark Miller
Michael McCandless wrote: which is why I'm baffled that Raf didn't see a speedup on upgrading. Mike Another point is that he may not have such a nasty set of segments - Raf says he has 24 indexes, which sounds like he may not have the logarithmic sizing you normally see. If you have somewh

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Mark Miller
he over-seeking problem. FieldCache does that, and RangeFilter on 2.4 does that, but RangeFilter (or RangeQuery with constant score mode) on 2.9 should not (they should do it per segment), which is why I'm baffled that Raf didn't see a speedup on upgrading. Mike Ah, right - anything ut

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Michael McCandless
king problem. FieldCache does that, and RangeFilter on 2.4 does that, but RangeFilter (or RangeQuery with constant score mode) on 2.9 should not (they should do it per segment), which is why I'm baffled that Raf didn't see a speedup on upgrading. Mike

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Mark Miller
When I did some profiling I saw that the slow down came from tons of extra seeks (single segment vs multisegment). What was happening was, the first couple segments would have thousands of terms for the field, but as the segments logarithmically shrank in size, the number of terms for the segme

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Michael McCandless
evious ones. > The big index is 7/8 times faster than multireader version. Hmmm, interesting! Can you provide more details about your tests? EG the code fragment showing your query, the creation of the MultiReader, how you run the search, etc.? Is the field that you're applying the Range

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Michael McCandless
On Fri, Apr 10, 2009 at 11:03 AM, Yonik Seeley wrote: > On Fri, Apr 10, 2009 at 10:48 AM, Michael McCandless > wrote: >> Unfortunately, in Lucene 2.4, any query that needs to enumerate Terms >> (Prefix, Wildcard, Range, etc.) has poor performance on Multi*Readers. > > Do we know why this is, and

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Raf
ve more or less 3M documents in 24 indexes and we read all of them > > using a MultiReader. > > If we do a search using only terms, there are no problems, but it if we > add > > to the same search terms a RangeFilter that extracts a large subset of > the > > documents (e.g. 500K

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Yonik Seeley
On Fri, Apr 10, 2009 at 10:48 AM, Michael McCandless wrote: > Unfortunately, in Lucene 2.4, any query that needs to enumerate Terms > (Prefix, Wildcard, Range, etc.) has poor performance on Multi*Readers. Do we know why this is, and if it's fixable (the MultiTermEnum, not the higher level query o

Re: RangeFilter performance problem using MultiReader

2009-04-10 Thread Michael McCandless
using RangeFilters and we think there are > some performance issues caused by MultiReader. > > We have more or less 3M documents in 24 indexes and we read all of them > using a MultiReader. > If we do a search using only terms, there are no problems, but it if we add > to the same se

RangeFilter performance problem using MultiReader

2009-04-10 Thread Raf
to the same search terms a RangeFilter that extracts a large subset of the documents (e.g. 500K), it takes a lot of time to execute (about 15s). In order to identify the problem, we have tried to consolidate the index: so now we have the same 3M docs in a single 10GB index. If we repeat the same

Re: Using RangeFilter

2008-01-24 Thread Antony Bowesman
vivek sar wrote: I've a field as NO_NORM, does it has to be untokenized to be able to sort on it? NO_NORMS is the same as UNTOKENIZED + omitNorms, so you can sort on that. Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] F

Re: Using RangeFilter

2008-01-24 Thread vivek sar
I've a field as NO_NORM, does it has to be untokenized to be able to sort on it? On Jan 21, 2008 12:47 PM, Antony Bowesman <[EMAIL PROTECTED]> wrote: > vivek sar wrote: > > I need to be able to sort on optime as well, thus need to store it . > > Lucene's default sorting does not need the field to

Re: Using RangeFilter

2008-01-21 Thread Antony Bowesman
vivek sar wrote: I need to be able to sort on optime as well, thus need to store it . Lucene's default sorting does not need the field to be stored, only indexed as untokenized. Antony - To unsubscribe, e-mail: [EMAIL PRO

Re: Using RangeFilter

2008-01-19 Thread Shai Erera
l Message > > From: vivek sar <[EMAIL PROTECTED]> > > To: java-user@lucene.apache.org > > Sent: Saturday, January 19, 2008 8:06:25 PM > > Subject: Using RangeFilter > > > > Hi, > > > > I have a requirement to filter out documents by date rang

Re: Using RangeFilter

2008-01-19 Thread vivek sar
y not just index them and not > store them if index size is a concern? > > Otis > > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > - Original Message > From: vivek sar <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent:

Re: Using RangeFilter

2008-01-19 Thread Otis Gospodnetic
g Sent: Saturday, January 19, 2008 8:06:25 PM Subject: Using RangeFilter Hi, I have a requirement to filter out documents by date range. I'm using RangeFilter (in combination to FilteredQuery) to do this. I was under the impression the filtering is done on documents, thus I'm just storing the

Using RangeFilter

2008-01-19 Thread vivek sar
Hi, I have a requirement to filter out documents by date range. I'm using RangeFilter (in combination to FilteredQuery) to do this. I was under the impression the filtering is done on documents, thus I'm just storing the date values, but not indexing them. As every new document would

Re: RangeFilter

2007-07-10 Thread Jay Yu
Thanks for clarifying this, Chris! I agree with you that javadocs usual should doc all they do but often times they skip few important things they do do. Chris Hostetter wrote: : Does anyone know if the RangeFilter is a cached filter? I could not : tell from the api. Generally speaking

Re: RangeFilter

2007-07-10 Thread Chris Hostetter
: Does anyone know if the RangeFilter is a cached filter? I could not : tell from the api. Generally speaking classes only document what they do, not what they *don't* do ... so if the javadocs don't say anything about caching, then it doesn't have any caching. more specificly

RangeFilter

2007-07-10 Thread Jay Yu
Hi All, Does anyone know if the RangeFilter is a cached filter? I could not tell from the api. Thanks! Jay - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: RangeQuery and RangeFilter

2006-03-08 Thread mark harwood
See http://wiki.apache.org/jakarta-lucene/FilteringOptions --- Anton Potehin <[EMAIL PROTECTED]> wrote: > What faster RangeQuery or RangeFilter ? > > ___ Win a BlackBerry device from O2 with Yahoo!. E

RangeQuery and RangeFilter

2006-03-08 Thread Anton Potehin
What faster RangeQuery or RangeFilter ?

Re: "filtering" using RangeFilter class

2006-03-03 Thread Chris Hostetter
: I am trying to filter my search using RangeFilter class but i get : BooleanQuery TooManyClauses exception. You aren't useing a RangeFilter, you are using a RangeQuery ... they are very different beasts. RangeQuery works fine for small ranges, or when you want the term frequencies of the

RE: "filtering" using RangeFilter class

2006-03-03 Thread Seeta Somagani
: java-user@lucene.apache.org Subject: "filtering" using RangeFilter class Hi All, I am trying to filter my search using RangeFilter class but i get BooleanQuery TooManyClauses exception. Exception in thread "main" org.apache.lucene.search.BooleanQuer

"filtering" using RangeFilter class

2006-03-03 Thread Urvashi Gadi
Hi All, I am trying to filter my search using RangeFilter class but i get BooleanQuery TooManyClauses exception. Exception in thread "main" org.apache.lucene.search.BooleanQuery$TooManyClauses at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.ja

Re: RangeFilter source

2005-10-20 Thread Chris Hostetter
: I downloaded the source code of 1.4.3 but did not find the source of : RangeFilter. : I could not find it in the sandbox either? : : RangeFilter, where art thou? RangeFilter was commited to the core, but after 1.4.3 was released... http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/java

RangeFilter source

2005-10-20 Thread Sharma, Siddharth
I downloaded the source code of 1.4.3 but did not find the source of RangeFilter. I could not find it in the sandbox either? RangeFilter, where art thou? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e