On Mon, 2009-10-12 at 20:02 +0200, Jake Mannix wrote:
> This killer is the "TermQuery for each term" part - this is huge. You need
> to invert this process, and use your query as is, but while walking in the
> HitCollector, on each doc which matches your query, increment counters for
> each of the
Ok, I will have a shot at the ascending docId order.
Chris
2009/10/13 Paul Elschot
> On Monday 12 October 2009 23:29:07 Christoph Boosz wrote:
> > Hi Paul,
> >
> > Thanks for your suggestion. I will test it within the next few days.
> > However, due to memory limitations, it will only work if t
On Monday 12 October 2009 23:29:07 Christoph Boosz wrote:
> Hi Paul,
>
> Thanks for your suggestion. I will test it within the next few days.
> However, due to memory limitations, it will only work if the number of hits
> is small enough, am I right?
One can load a single term vector at a time, s
Hi Paul,
Thanks for your suggestion. I will test it within the next few days.
However, due to memory limitations, it will only work if the number of hits
is small enough, am I right?
Chris
2009/10/12 Paul Elschot
> Chris,
>
> You could also store term vectors for all docs at indexing
> time, a
Chris,
You could also store term vectors for all docs at indexing
time, and add the termvectors for the matching docs into a
(large) map of terms in RAM.
Regards,
Paul Elschot
On Monday 12 October 2009 21:30:48 Christoph Boosz wrote:
> Hi Jake,
>
> Thanks for your helpful explanation.
> In fac
Hi Jake,
Thanks for your helpful explanation.
In fact, my initial solution was to traverse each document in the result
once and count the contained terms. As you mentioned, this process took a
lot of memory.
Trying to confine the memory usage with the facet approach, I was surprised
by the decline
Hey Chris,
On Mon, Oct 12, 2009 at 10:30 AM, Christoph Boosz <
christoph.bo...@googlemail.com> wrote:
> Thanks for your reply.
> Yes, it's likely that many terms occur in few documents.
>
> If I understand you right, I should do the following:
> -Write a HitCollector that simply increments a coun
Thanks for your reply.
Yes, it's likely that many terms occur in few documents.
If I understand you right, I should do the following:
-Write a HitCollector that simply increments a counter
-Get the filter for the user query once: new CachingWrapperFilter(new
QueryWrapperFilter(userQuery));
-Create
Given you have 1M docs and about 1M terms, do you see very few docs per
term?
If your DocSet per term is very sparse, BitSet is probably not a good
representation. Simple int array maybe better for memory, and faster for
iterating.
-John
On Mon, Oct 12, 2009 at 8:45 AM, Paul Elschot wrote:
> On
On Monday 12 October 2009 14:53:45 Christoph Boosz wrote:
> Hi,
>
> I have a question related to faceted search. My index contains more than 1
> million documents, and nearly 1 million terms. My aim is to get a DocIdSet
> for each term occurring in the result of a query. I use the approach
> descr
Hi,
I have a question related to faceted search. My index contains more than 1
million documents, and nearly 1 million terms. My aim is to get a DocIdSet
for each term occurring in the result of a query. I use the approach
described on
http://sujitpal.blogspot.com/2007/04/lucene-search-within-sear
11 matches
Mail list logo