Re: Optimizing unordered queries

2009-07-08 Thread Nigel
I created a benchmark test using real queries from our logs. I kept the LRU cache the same for now and varied the index divisor: index divisor = 1: 768 sec. index divisor = 4: 788 sec. (+ 3%) index divisor = 8: 855 sec. (+ 11%) index divisor = 16: 997 sec. (+ 30%) This is exciting news for me, a

Re: Optimizing unordered queries

2009-07-07 Thread Jason Rutherglen
Ah ok, I was thinking we'd wait for the new flex indexing patch. I had started working along these lines before and will take it on as a project (which is I believe reducing the memory consumption of the term dictionary). I plan to segue it into the tag index at some point. On Tue, Jul 7, 2009 at

Re: Optimizing unordered queries

2009-07-07 Thread Michael McCandless
OK good to hear you have a sane number of TermInfos now... I think many apps don't have nearly as many unique terms as you do; your approach (increase index divisor & LRU cache) sounds reasonable. It'll make warming more important. Please report back how it goes! Lucene is unfortunately rather w

Re: Optimizing unordered queries

2009-07-06 Thread Nigel
On Mon, Jul 6, 2009 at 12:37 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Mon, Jun 29, 2009 at 9:33 AM, Nigel wrote: > > > Ah, I was confused by the index divisor being 1 by default: I thought it > > meant that all terms were being loaded. I see now in SegmentTermEnum > that >

Re: Optimizing unordered queries

2009-07-06 Thread Michael McCandless
On Mon, Jun 29, 2009 at 9:33 AM, Nigel wrote: > Ah, I was confused by the index divisor being 1 by default: I thought it > meant that all terms were being loaded.  I see now in SegmentTermEnum that > the every-128th behavior is implemented at a lower level. > > But I'm even more confused about why

Re: Optimizing unordered queries

2009-06-29 Thread Nigel
On Mon, Jun 29, 2009 at 6:28 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Sun, Jun 28, 2009 at 9:08 PM, Nigel wrote: > >> Unfortunately the TermInfos must still be hit to look up the > >> freq/proxOffset in the postings files. > > > > But for that data you only have to hit the T

Re: Optimizing unordered queries

2009-06-29 Thread Michael McCandless
On Sun, Jun 28, 2009 at 9:08 PM, Nigel wrote: >> Unfortunately the TermInfos must still be hit to look up the >> freq/proxOffset in the postings files. > > But for that data you only have to hit the TermInfos for the terms you're > searching, correct?  So, assuming that there are vastly more terms

Re: Optimizing unordered queries

2009-06-28 Thread Nigel
On Fri, Jun 26, 2009 at 11:06 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Thu, Jun 25, 2009 at 10:11 PM, Nigel wrote: > > > Currently we're (perhaps naively) doing the equivalent of > > query.weight(searcher).scorer(reader).score(collector). Obviously > there's a > > certain a

Re: Optimizing unordered queries

2009-06-28 Thread Nigel
On Fri, Jun 26, 2009 at 10:52 AM, eks dev wrote: > > also see, > http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/BooleanQuery.html#getAllowDocsOutOfOrder() Interesti

Re: Optimizing unordered queries

2009-06-28 Thread Nigel
On Fri, Jun 26, 2009 at 10:51 AM, eks dev wrote: > > You omitNorms(), did you also omitTf()? We did, but had to include TF after all since omitting it also dropped position information, which we needed for phrase queries. I didn't think it was possible to remove just the TFs without the positi

Re: Optimizing unordered queries

2009-06-26 Thread Michael McCandless
On Thu, Jun 25, 2009 at 10:11 PM, Nigel wrote: > Currently we're (perhaps naively) doing the equivalent of > query.weight(searcher).scorer(reader).score(collector).  Obviously there's a > certain amount of unnecessary calculation that results from this if you > don't care about sorting.  Are there

Re: Optimizing unordered queries

2009-06-26 Thread eks dev
also see, http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/BooleanQuery.html#getAllowDocsOutOfOrder() - Original Message > From: Nigel > To: java-user@lucene.apache.org > Sent: Friday, 26 June, 2009 4:11:53 > Subject: Optimizing unordered queries >

Re: Optimizing unordered queries

2009-06-26 Thread eks dev
Scoring information ... try: - Original Message > From: Nigel > To: java-user@lucene.apache.org > Sent: Friday, 26 June, 2009 4:11:53 > Subject: Optimizing unordered queries > > I recently posted some questions about performance problems with large > indexes. On

Optimizing unordered queries

2009-06-25 Thread Nigel
I recently posted some questions about performance problems with large indexes. One key thing about our situation is that we don't need sorted results (either by relevance or any other key). I've been looking into our memory usage and tracing through some code, which in combination with the recen