Re: faceted search performance

2009-10-27 Thread Toke Eskildsen
On Mon, 2009-10-12 at 20:02 +0200, Jake Mannix wrote: > This killer is the "TermQuery for each term" part - this is huge. You need > to invert this process, and use your query as is, but while walking in the > HitCollector, on each doc which matches your query, increment counters for > each of the

Re: faceted search performance

2009-10-13 Thread Christoph Boosz
Ok, I will have a shot at the ascending docId order. Chris 2009/10/13 Paul Elschot > On Monday 12 October 2009 23:29:07 Christoph Boosz wrote: > > Hi Paul, > > > > Thanks for your suggestion. I will test it within the next few days. > > However, due to memory limitations, it will only work if t

Re: faceted search performance

2009-10-13 Thread Paul Elschot
On Monday 12 October 2009 23:29:07 Christoph Boosz wrote: > Hi Paul, > > Thanks for your suggestion. I will test it within the next few days. > However, due to memory limitations, it will only work if the number of hits > is small enough, am I right? One can load a single term vector at a time, s

Re: faceted search performance

2009-10-12 Thread Christoph Boosz
Hi Paul, Thanks for your suggestion. I will test it within the next few days. However, due to memory limitations, it will only work if the number of hits is small enough, am I right? Chris 2009/10/12 Paul Elschot > Chris, > > You could also store term vectors for all docs at indexing > time, a

Re: faceted search performance

2009-10-12 Thread Paul Elschot
Chris, You could also store term vectors for all docs at indexing time, and add the termvectors for the matching docs into a (large) map of terms in RAM. Regards, Paul Elschot On Monday 12 October 2009 21:30:48 Christoph Boosz wrote: > Hi Jake, > > Thanks for your helpful explanation. > In fac

Re: faceted search performance

2009-10-12 Thread Christoph Boosz
Hi Jake, Thanks for your helpful explanation. In fact, my initial solution was to traverse each document in the result once and count the contained terms. As you mentioned, this process took a lot of memory. Trying to confine the memory usage with the facet approach, I was surprised by the decline

Re: faceted search performance

2009-10-12 Thread Jake Mannix
Hey Chris, On Mon, Oct 12, 2009 at 10:30 AM, Christoph Boosz < christoph.bo...@googlemail.com> wrote: > Thanks for your reply. > Yes, it's likely that many terms occur in few documents. > > If I understand you right, I should do the following: > -Write a HitCollector that simply increments a coun

Re: faceted search performance

2009-10-12 Thread Christoph Boosz
Thanks for your reply. Yes, it's likely that many terms occur in few documents. If I understand you right, I should do the following: -Write a HitCollector that simply increments a counter -Get the filter for the user query once: new CachingWrapperFilter(new QueryWrapperFilter(userQuery)); -Create

Re: faceted search performance

2009-10-12 Thread John Wang
Given you have 1M docs and about 1M terms, do you see very few docs per term? If your DocSet per term is very sparse, BitSet is probably not a good representation. Simple int array maybe better for memory, and faster for iterating. -John On Mon, Oct 12, 2009 at 8:45 AM, Paul Elschot wrote: > On

Re: faceted search performance

2009-10-12 Thread Paul Elschot
On Monday 12 October 2009 14:53:45 Christoph Boosz wrote: > Hi, > > I have a question related to faceted search. My index contains more than 1 > million documents, and nearly 1 million terms. My aim is to get a DocIdSet > for each term occurring in the result of a query. I use the approach > descr

faceted search performance

2009-10-12 Thread Christoph Boosz
Hi, I have a question related to faceted search. My index contains more than 1 million documents, and nearly 1 million terms. My aim is to get a DocIdSet for each term occurring in the result of a query. I use the approach described on http://sujitpal.blogspot.com/2007/04/lucene-search-within-sear