Re: Using filters to speed up queries

2010-10-25 Thread Michael McCandless
Here's the paper I was thinking of (Robert found this): http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.159.9682 ... eg note this sentence from the abstract: We show that the first implementation, based on a postprocessing approach, allows an arbitrary user to obtain information about

Re: Using filters to speed up queries

2010-10-24 Thread Paul Elschot
Some more speed up may be possible when the same combination of filters (user account and date range here) is reused for another query. The combined filter can then be made as an OpenBitSetDISI (in the util package) and kept around for reuse. Regards, Paul Elschot Op zondag 24 oktober 2010 12:34:

RE: Using filters to speed up queries

2010-10-24 Thread Uwe Schindler
er 24, 2010 12:34 PM To: dev@lucene.apache.org Subject: Re: Using filters to speed up queries Here is what I've found so far: I have three main sets to use in a query: Account MUST be xxx User query DateRange on the query MUST be in (a,b) it is a NumericField I tried the following com

Re: Using filters to speed up queries

2010-10-24 Thread Khash Sajadi
ctober 24, 2010 12:34 PM > > *To:* dev@lucene.apache.org > *Subject:* Re: Using filters to speed up queries > > > > Here is what I've found so far: > > I have three main sets to use in a query: > > Account MUST be xxx > > User query > > DateRange on

RE: Using filters to speed up queries

2010-10-24 Thread Uwe Schindler
o: dev@lucene.apache.org Subject: Re: Using filters to speed up queries Here is what I've found so far: I have three main sets to use in a query: Account MUST be xxx User query DateRange on the query MUST be in (a,b) it is a NumericField I tried the following combinations (all using a Boolea

Re: Using filters to speed up queries

2010-10-24 Thread Khash Sajadi
Here is what I've found so far: I have three main sets to use in a query: Account MUST be xxx User query DateRange on the query MUST be in (a,b) it is a NumericField I tried the following combinations (all using a BooleanQuery with the user query added to it) 1. One: - Add ACCOUNT as a TermQuery

Re: Using filters to speed up queries

2010-10-24 Thread Paul Elschot
Op zondag 24 oktober 2010 00:18:48 schreef Khash Sajadi: > My index contains documents for different users. Each document has the user > id as a field on it. > > There are about 500 different users with 3 million documents. > > Currently I'm calling Search with the query (parsed from user) > and

Re: Using filters to speed up queries

2010-10-24 Thread Michael McCandless
Unfortunately, Lucene's performance with filters isn't great. This is because we now always apply filters "up high", using a leapfrog approach, where we alternate asking the filter and then the scorer to skip to each other's docID. But if the filter accepts "enough" (~1% in my testing) of the doc

RE: Using filters to speed up queries

2010-10-23 Thread Uwe Schindler
k] Sent: Sunday, October 24, 2010 12:52 AM To: dev@lucene.apache.org Subject: Re: Using filters to speed up queries On the topic of BooleanQuery. Would the order of the queries being added matter? Is it clever enough to skip the second query when the first one is returning nothing and is a MUST

Re: Using filters to speed up queries

2010-10-23 Thread Khash Sajadi
On the topic of BooleanQuery. Would the order of the queries being added matter? Is it clever enough to skip the second query when the first one is returning nothing and is a MUST? On 23 October 2010 23:47, Khash Sajadi wrote: > Thanks. Will try it. Been thinking about separate indexes but have

Re: Using filters to speed up queries

2010-10-23 Thread Khash Sajadi
Thanks. Will try it. Been thinking about separate indexes but have one worry: memory and file handle issues. I'm worried that in scenarios I might end up with thousands of IndexReaders/IndexWriters open in the process (it is Windows). How is that going to play out with memory? On 23 October 2010

Re: Using filters to speed up queries

2010-10-23 Thread Mark Harwood
Look at BooleanQuery with 2 "must" clauses - one for the query, one for a ConstantScoreQuery wrapping the filter. BooleanQuery should then use automatically use skips when reading matching docs from the main query and skip to the next docs identified by the filter. Give it a try, otherwise you ma