Re: Realtime search best practices

Michael McCandless Mon, 12 Oct 2009 14:06:54 -0700

OK I just committed it -- thanks!

Mike


On Mon, Oct 12, 2009 at 5:01 PM, Jake Mannix <jake.man...@gmail.com> wrote:
> That seems a lot more straightforward Mike, thanks.
>
>  -jake
>
> On Mon, Oct 12, 2009 at 1:56 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> I agree, the javadocs could be improved.  How about something like
>> this for the first 2 paragraphs:
>>
>>   * Returns a readonly reader, covering all committed as
>>   * well as un-committed changes to the index.  This
>>    * provides "near real-time" searching, in that changes
>>    * made during an IndexWriter session can be quickly made
>>   * available for searching without closing the writer nor
>>   * calling {...@link #commit}.
>>   *
>>   * <p>Note that this is functionally equivalent to calling
>>   * {#commit} and then using {...@link IndexReader#open} to
>>   * open a new reader.  But the turarnound time of this
>>   * method should be faster since it avoids the potentially
>>   * costly {...@link #commit}.<p>
>>
>> Mike
>>
>> On Mon, Oct 12, 2009 at 4:35 PM, Jake Mannix <jake.man...@gmail.com>
>> wrote:
>> > Thanks Yonik,
>> >
>> >  It may be surprising, but in fact I have read that
>> > javadoc.  It talks about not needing to close the
>> > writer, but doesn't specifically talk about the what
>> > the relationship between commit() calls and
>> > getReader() calls is.  I suppose I should have
>> > interpreted:
>> >
>> > "@returns a new reader which contains all
>> > changes..."
>> >
>> > to mean "all uncommitted changes", but why
>> > is it so obvious that what could be happening
>> > is that it only "returns all changes since the last
>> > commit, but without touching disk because it
>> > has docs in memory as well"?
>> >
>> >  -jake
>> >
>> > On Mon, Oct 12, 2009 at 1:26 PM, Yonik Seeley <
>> yo...@lucidimagination.com>wrote:
>> >
>> >> Guys, please - you're not new at this... this is what JavaDoc is for:
>> >>
>> >>  /**
>> >>   * Returns a readonly reader containing all
>> >>   * current updates.  Flush is called automatically.  This
>> >>   * provides "near real-time" searching, in that changes
>> >>   * made during an IndexWriter session can be made
>> >>   * available for searching without closing the writer.
>> >>   *
>> >>   * <p>It's near real-time because there is no hard
>> >>   * guarantee on how quickly you can get a new reader after
>> >>   * making changes with IndexWriter.  You'll have to
>> >>   * experiment in your situation to determine if it's
>> >>   * fast enough.  As this is a new and experimental
>> >>   * feature, please report back on your findings so we can
>> >>   * learn, improve and iterate.</p>
>> >>   *
>> >>   * <p>The resulting reader supports {...@link
>> >>   * IndexReader#reopen}, but that call will simply forward
>> >>   * back to this method (though this may change in the
>> >>   * future).</p>
>> >>   *
>> >>   * <p>The very first time this method is called, this
>> >>   * writer instance will make every effort to pool the
>> >>   * readers that it opens for doing merges, applying
>> >>   * deletes, etc.  This means additional resources (RAM,
>> >>   * file descriptors, CPU time) will be consumed.</p>
>> >>   *
>> >>   * <p>For lower latency on reopening a reader, you should
>> >>   * call {...@link #setMergedSegmentWarmer} to
>> >>   * pre-warm a newly merged segment before it's committed
>> >>   * to the index.  This is important for minimizing
>> >>   * index-to-search delay after a large merge.  </p>
>> >>   *
>> >>   * <p>If an addIndexes* call is running in another thread,
>> >>   * then this reader will only search those segments from
>> >>   * the foreign index that have been successfully copied
>> >>   * over, so far</p>.
>> >>   *
>> >>   * <p><b>NOTE</b>: Once the writer is closed, any
>> >>   * outstanding readers may continue to be used.  However,
>> >>   * if you attempt to reopen any of those readers, you'll
>> >>   * hit an {...@link AlreadyClosedException}.</p>
>> >>   *
>> >>   * <p><b>NOTE:</b> This API is experimental and might
>> >>   * change in incompatible ways in the next release.</p>
>> >>   *
>> >>   * @return IndexReader that covers entire index plus all
>> >>   * changes made so far by this IndexWriter instance
>> >>   *
>> >>   * @throws IOException
>> >>   */
>> >>  public IndexReader getReader() throws IOException {
>> >>
>> >>
>> >> -Yonik
>> >> http://www.lucidimagination.com
>> >>
>> >>
>> >> On Mon, Oct 12, 2009 at 4:18 PM, John Wang <john.w...@gmail.com> wrote:
>> >> > Oh, that is really good to know!
>> >> > Is this deterministic? e.g. as long as writer.addDocument() is called,
>> >> next
>> >> > getReader reflects the change? Does it work with deletes? e.g.
>> >> > writer.deleteDocuments()?
>> >> > Thanks Mike for clarifying!
>> >> >
>> >> > -John
>> >> >
>> >> > On Mon, Oct 12, 2009 at 12:11 PM, Michael McCandless <
>> >> > luc...@mikemccandless.com> wrote:
>> >> >
>> >> >> Just to clarify: IndexWriter.newReader returns a reader that searches
>> >> >> uncommitted changes as well.  Ie, you need not call
>> IndexWriter.commit
>> >> >> to make the changes visible.
>> >> >>
>> >> >> However, if you're opening a reader the "normal" way
>> >> >> (IndexReader.open) then it is necessary to first call
>> >> >> IndexWriter.commit.
>> >> >>
>> >> >> Mike
>> >> >>
>> >> >> On Mon, Oct 12, 2009 at 5:24 AM, melix <cedric.champ...@lingway.com>
>> >> >> wrote:
>> >> >> >
>> >> >> > Hi,
>> >> >> >
>> >> >> > I'm going to replace an old reader/writer synchronization mechanism
>> we
>> >> >> had
>> >> >> > implemented with the new near realtime search facilities in Lucene
>> >> 2.9.
>> >> >> > However, it's still a bit unclear on how to efficiently do it.
>> >> >> >
>> >> >> > Is the following implementation the good way to do achieve it ? The
>> >> >> context
>> >> >> > is concurrent read/writes on an index :
>> >> >> >
>> >> >> > 1. create a Directory instance
>> >> >> > 2. create a writer on this directory
>> >> >> > 3. on each write request, add document to the writer
>> >> >> > 4. on each read request,
>> >> >> >  a. use writer.getReader() to obtain an up-to-date reader
>> >> >> >  b. create an IndexSearcher with that reader
>> >> >> >  c. perform Query
>> >> >> >  d. close IndexSearcher
>> >> >> > 5. on application close
>> >> >> >  a. close writer
>> >> >> >  b. close directory
>> >> >> >
>> >> >> > While this seems to be ok, I'm really wondering about the
>> performance
>> >> of
>> >> >> > opening a searcher for each request. I could introduce some kind of
>> >> delay
>> >> >> > and cache a searcher for some seconds, but I'm not sure it's the
>> best
>> >> >> thing
>> >> >> > to do.
>> >> >> >
>> >> >> > Thanks,
>> >> >> >
>> >> >> > Cedric
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > View this message in context:
>> >> >>
>> >>
>> http://www.nabble.com/Realtime-search-best-practices-tp25852756p25852756.html
>> >> >> > Sent from the Lucene - Java Users mailing list archive at
>> Nabble.com.
>> >> >> >
>> >> >> >
>> >> >> >
>> ---------------------------------------------------------------------
>> >> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >> >> >
>> >> >> >
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >> >>
>> >> >>
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Realtime search best practices

Reply via email to