Re: Using Lucene partly as DB and 'joining' search results.

Paul Elschot Sat, 12 Apr 2008 01:52:31 -0700

Op Saturday 12 April 2008 00:03:13 schreef Antony Bowesman:
> Paul Elschot wrote:
> > Op Friday 11 April 2008 13:49:59 schreef Mathieu Lecarme:
> >> Use Filter and BitSet.
> >>  From the personnal data, you build a Filter
> >> (http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/
> >>Fil ter.html) wich is used in the main index.
> >
> > With 1 billion mails, and possibly a Filter per user, you may want
> > to use more compact filters than BitSets, which is currently
> > possible in the development trunk of lucene.
>
> Thanks for the pointers.  I've already used Solr's DocSet interface
> in my implementation, which I think is where the ideas for the
> current Lucene enhancements came from.


The ideas came from quite a few sources. They can be traced
starting from changes.txt in the sources.

> They work well to reduce the 
> filter's footprint.  I'm also caching filters.
>
> The intention is that there is a user data index and the mail
> index(es).  The search against user data index will return a set of
> mail Ids, which is the common key between the two. Doc Ids are no 
> good between the indexes, so that means a potentially large boolean
> OR query to create the filter of labelled mails in the mail indexes. 
> I know it's a theoretical question, but will this perform?

The normal way to collect doc ids for a filter is into a bitset
iterating over the indexed ids (mail ids in your case). A bitset
has random access, so there is no need to do this in doc id order.
An OR query has to work in doc id order so it can compute a score
per doc id, and the ordering loses some performance.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Using Lucene partly as DB and 'joining' search results.

Reply via email to