Re: Using ParallelReader over large immutable index and small updatable index

Andy Liu Wed, 07 Mar 2007 08:18:25 -0800

From my understanding, MultiSearcher is used to combine two indexes that

have the same fields but different documents.  ParallelReader is used to
combine two indexes that have same documents but different fields.  I'm
trying to do the latter.  Is my understanding correct?  For example, what
I'm trying to do is have one immutable index that has these fields:


field1
field2
field3

and my "update" index that has one field

field4

Both indexes have the same documents, and the docId's are synchronized.
This allows me to execute searches like:

+field1:foo +field4:bar

field4 is a field that would be updated frequently and as real-time as
possible.  However, once I update field4, the docId's are no longer
synchronized, and ParallelReader fails.

Andy

On 3/6/07, Alexey Lef <[EMAIL PROTECTED]> wrote:


We use MultiSearcher for a similar scenario. This way you can keep the
Searcher/Reader for the read-only index alive and refresh the small index
Searcher whenever an update is made. If you have any cached filters, they
are mapped to a Reader, so the cached filters for the big index will stay
alive as well. The only (small) problem I have found so far is how
MultiSearcher handles custom Similarity (see
https://issues.apache.org/jira/browse/LUCENE-789).

Hope this helps,

Alexey

-----Original Message-----
From: Andy Liu [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 06, 2007 3:34 PM
To: java-user@lucene.apache.org
Subject: Using ParallelReader over large immutable index and small
updatable index

Is there a working solution out there that would let me use ParallelReader
to search over a large, immutable index and a smaller, auxillary index
that
is updated frequently?  Currently, from my understanding, the
ParallelReader
fails when one of the indexes is updated because the document ID's get out
of synch.  Using ParallelReader in this way is attractive for me because
it
would allow me to quickly make updates to only the fields that change.

The alternative is to use one index.  However, an update would require me
to
delete the entire document (which is quite large in my application) and
reinsert it after making updates.  This requires a lot more I/O and is a
lot
slower, and I'd like to avoid this if possible.

I can think of other alternatives, but all involve storing data and/or
bitsets in memory, which is not very scalable.  I need to be able to
handle
millions of documents.

I'm also open to any solution that don't involve ParallelReader that would
help me make quick updates in the most non-disruptive and scalable
fashion.
But it just seems that ParallelReader would be perfect for me needs, if I
can get past this issue.

I've seen posts about this issue on the list, but nothing pointing to a
solution.  Can somebody help me out?

Andy

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Using ParallelReader over large immutable index and small updatable index

Reply via email to