Re: DefaultIndexAccessor

Mark Miller Mon, 04 Feb 2008 09:16:38 -0800

The purpose of IndexAccessor is to coordinate Readers/Writers for aLucene index. Readers and Writers in Lucene are multi-threaded in thatmultiple threads may use them at the same time, but they must/should beshared and there are special rules (You cannot delete with a Readerwhile a Writer is working on the index). Also, you need to refreshReader views every so often; this is expensive (though usually much lessso with the new reopen method).

IndexAccessor enforces the rules and controls Reader refreshing. Insteadof worrying about caching or index interaction rules, you just ask foryour Reader/Writer, use it to search or add a doc, and then return it.The rest is taken care of for you.

This is done by keeping a cached Writer and Searcher(s) that all threadsshare. References to the Searchers are counted so that after a Writer isreturned (and no other thread has a reference to the Writer),IndexAccessor waits for all of the current Searchers to come back andthen reopens their Readers.

In this regard, you get a similar setup to what Solr might give: fromany thread you just add docs and run searches -- you don't have to worryabout refreshing Readers or sharing Writers/Readers or one threaddeleting with a Reader while another thread tries to write with a Writer.

This setup allows you to do other cool things, like warm Searchersbefore putting them into action. Thats what the code I am posting soonis be capable of - when the Readers are reopened, search requests willstill be handled by the old Readers while the new Searchers run a samplequery with optional sort fields. This will make sure the Reader is openand its sort caches are loaded before the first thread tries to use it.Much faster response to applications.

You must open a new Reader or reopen a Reader to see recently addeddocs...IndexAccessor provides no real way around that. But it does makethe reopening much easier -- and your application that just wants to adddocs and search at will from multiple threads, won't have to worry about it.

You can bail out here, or if you want further clarification I willinclude an alternate attempt at what IndexAccessor is below.


- Mark

----------------------------------------------------------------------------------------------------

When accessing a Lucene index from multiple threads, there are a varietyof issues that you must address.


1. The Readers/Writer should be shared across threads.

2. Readers must periodically be refreshed, either be creating newinstances or using the new reopen method.3. A Reader that writes needs to be properly coordinated with a Writereg they cannot be used at the same time.


IndexAccessor addresses each of these issues.

How it works:

A single Writer is shared among threads that try to concurrentlyretrieve and use a Writer. Once all of these threads release theirreference

to the Writer, it is closed and upon the next request a new one is created.

A single Searcher for each Similarity is also shared across threads.Upon first request, a new Searcher is created. This Searcher is thenreturnedupon every request. A count of every Searcher reference retrieved ismaintained.

When all references to a Writer are released, the Writer is closed andafter waiting for all of the Searchers to be returned, the Searchers arereopened. Without warming enabled, new requests for Searchers/Readersmust wait for this reopen to complete. If warming is enabled, the oldSearchers/Readers continue handling Searcher requests until the Readershave been reopened and any requested sort caches have been loaded.

If you ask for a writing Reader, you will not get it until a Writer isreleased and vice versa.

The result is that you can freely use Writers/Readers/Searchers from anythread without considering thread interactions. ***

If you want to add docs, just ask for a Writer, add the docs, andrelease the Writer. If you want to search, get a Searcher, search,and release the Searcher. You don't have to worry about reopeningReaders or coordinating access.

***

You still do have to consider things like hogging the Writer/Readers -if you don't occasionally release them, things will not stay veryinteractive.The best method is to just get the object, use it, and then return it ina finally block. Batch load multiple docs, but if your just randomly addinga doc, get the Writer, add it, and then release the Writer in a finallyblock. If you are batch loading a million docs and you want to be ableto see themas they are added: get the writer and add 10,000 docs (or something),release the Writer, get the Writer and add 10,000 docs, etc.


Cam Bazz wrote:

Hello Mark,

I have been reading the code - and honestly I have not understood how it
works. I was hoping that this was a solution to the case when you are adding
documents - in a multithreaded way, it allows other non-writer threads to be
able to see documents added without refreshing the indexsearcher - by using
some caching mechanism.

Could you elaborate what IndexAccessor does and how it does it a little bit
more?

Best Regards,
-C.B.

On Feb 4, 2008 3:06 PM, Mark Miller <[EMAIL PROTECTED]> wrote:

IndexAccessor-1.26.2008.zip is the latest one. I will be dating a zip from
now on.

I hope to post new code with the warming either tonight or tomorrow night.
I would be ecstatic to have some help vetting that.

Also, I am thinking of making a change so that when you release the Writer
the thread that releases does not block until reopen. I think the original
author did this so that if you add a doc with a thread and then immediately
search from the same thread, you are guaranteed to find the doc. However,
this gaurentee did not hold -- if another thread had a reference to the
Writer and a new thread grabbed a Writer and then quicly released before the
first thread, you will have added a doc but it will not be visible until the
first thread releases its reference to the Writer...since the concept is not
enforced anyway, you might as well not block for the final thread that
releases the Writer either. Instead I will grab a thread from a thread pool
to do the reopening with that thread, and return right after closing the
Writer. The result is that you cannot add a doc and search and expect to
find it without waiting a second or too. But this way things will be
consistent, and an app that adds docs will be a bit more responsive....eg it
wont hang as Readers are being reopened.

I also have to bring the AccessProvider classes back. No easy way to use
your own custom Readers without it...I shouldn't have stripped it out.

- Mark



Cam Bazz wrote:

Hello,

Regarding https://issues.apache.org/jira/browse/LUCENE-1026 , this seems
very interesting. I have read the discussion on the page, but I could

not

figure out which set of files is the latest.
Is it the IndexAccessor-1.26.2008.zip file?

I will read through the code, make my own tests, and send some feedback.

Best.
-C.B.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DefaultIndexAccessor

Reply via email to