Hmmm, could you show us what you do in your collector? Because one of the gotchas about a collector is loading the documents in the inner loop. Quick test: comment out whatever you're doing in the underlying collector loop, and see if there's *any* noticeable difference in speed. That'll tell you whether your problems arise from the filter construction/search or what you're doing in the collector....
Best Erick On Sun, Nov 22, 2009 at 11:41 AM, Eran Sevi <erans...@gmail.com> wrote: > I think it shouldn't take X5 times longer since the number of results is > only about X2 times larger (and much smaller than the number of terms in > the > filter), but maybe I'm wrong here since I'm not familiar with the filter > internals. > > Unfortunately, the time to construct the filter is mere milliseconds. > almost all of the time (~5secs) are spent in the search method. > I'm using a collector to retrieve all the results (and fetch a value for > some fields) but without the filter this also takes less then a second for > the same number of results. > > On Sun, Nov 22, 2009 at 5:57 PM, Erick Erickson <erickerick...@gmail.com > >wrote: > > > Hmmm, I'm not very clear here. Are you saying that you effectively > > form 10-50K filters and OR them all together? That would be > > consistent with the 50K case taking approx. 5X a long as the 10K > > case..... > > > > Do you know where in your code the time is being spent? That'd > > be a big help in suggesting alternatives. If I'm on the right track, > > I'd expect the time to be spent assembling the filters..... > > > > Not much help here, but I'm having trouble wrapping my head > > around this... > > > > Best > > Erick > > > > On Sun, Nov 22, 2009 at 9:48 AM, Eran Sevi <erans...@gmail.com> wrote: > > > > > Hi, > > > > > > I have a need to filter my queries using a rather large subset of terms > > > (can > > > be 10K or even 50K). > > > All these terms are sure to exist in the index so the number of results > > can > > > be about the same number of terms in the filter. > > > The terms are numbers but are not subsequent and are from a large set > of > > > possible values (so range queries are probably not good for me). > > > The index itself is about 1M docs and running even a simple query with > > such > > > a large filter takes a lot of time even if the number of results is > only > > a > > > few hundred docs. > > > It seems like the speed is affected by the length of the filter even if > > the > > > number of results remains more or less the same, which is logical but > not > > > by > > > such a large loss of performance as I'm experiencing (running the query > > > with > > > a 10K terms filter takes an average of 1s 187ms with 600 results while > > > running it with a 50K terms filter takes an average of 5s 207ms with > 1000 > > > results). > > > > > > Currently I'm using a QueryFilter with a boolean query in which I "OR" > > the > > > different terms together. > > > I also can't use a cached filter efficiently since the terms to filter > on > > > change almost every query. > > > > > > I was wondering if there's a better way to filter my queries so they > > won't > > > take a few seconds to run? > > > > > > Thanks in advance for any advise, > > > Eran. > > > > > >