On Fri, Mar 27, 2009 at 1:12 PM, Marvin Humphrey <mar...@rectangular.com> wrote:
>> Why must merge policy be made public for realtime search? [In Lucy] > > Because real-time search under Lucy needs to be able to operate using multiple > write processes, since threads will not always be available. > > You need to be able to tell one indexer *not* to merge anything when > performing fast updates, and you need to be able to tell another indexer what > to merge when performing background consolidation. Is this because you want to not swamp IO system? Ie you're emulating IO prioritization. (Which I think makes sense, but, it's more of an optimization than purely necessary for realtime search). In the prototype near realtime search in Lucene (on LUCENE-1516), it's fully independent of the merge policy (but, yes, a smarter merge policy can reduce the turnaround times). >> > Actually, if you're not warming sort caches, launching a Lucene IndexReader >> > isn't obscenely expensive any more -- just expensive. Right? >> >> We load deleted docs on init (1 bit per doc = fast), terms index (= >> alot of stuff every 128 terms = maybe slow), norms on the first search >> that hits that field (1 byte per doc = probably OK), and FieldCache on >> first search that uses it. So "it depends" I guess? > > For the purposes of MergePolicy, all you would need are the doc counts and the > delcounts, and optionally other stuff in SegmentInfos. In theory you could > lazy load the other stuff like the term dictionary index. Obviously that > would be an unacceptable behavioral change, but it's worth noting. True. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org