Re: MergePolicy public but SegmentInfos package protected?

Michael McCandless Fri, 27 Mar 2009 12:59:38 -0700

On Fri, Mar 27, 2009 at 1:12 PM, Marvin Humphrey <mar...@rectangular.com> wrote:


>> Why must merge policy be made public for realtime search? [In Lucy]
>
> Because real-time search under Lucy needs to be able to operate using multiple
> write processes, since threads will not always be available.
>
> You need to be able to tell one indexer *not* to merge anything when
> performing fast updates, and you need to be able to tell another indexer what
> to merge when performing background consolidation.

Is this because you want to not swamp IO system?  Ie you're emulating
IO prioritization.  (Which I think makes sense, but, it's more of an
optimization than purely necessary for realtime search).

In the prototype near realtime search in Lucene (on LUCENE-1516), it's
fully independent of the merge policy (but, yes, a smarter merge
policy can reduce the turnaround times).

>> > Actually, if you're not warming sort caches, launching a Lucene IndexReader
>> > isn't obscenely expensive any more -- just expensive.  Right?
>>
>> We load deleted docs on init (1 bit per doc = fast), terms index (=
>> alot of stuff every 128 terms = maybe slow), norms on the first search
>> that hits that field (1 byte per doc = probably OK), and FieldCache on
>> first search that uses it.  So "it depends" I guess?
>
> For the purposes of MergePolicy, all you would need are the doc counts and the
> delcounts, and optionally other stuff in SegmentInfos.  In theory you could
> lazy load the other stuff like the term dictionary index.  Obviously that
> would be an unacceptable behavioral change, but it's worth noting.

True.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: MergePolicy public but SegmentInfos package protected?

Reply via email to