The purpose of IndexAccessor is to coordinate Readers/Writers for a Lucene index. Readers and Writers in Lucene are multi-threaded in that multiple threads may use them at the same time, but they must/should be shared and there are special rules (You cannot delete with a Reader while a Writer is working on the index). Also, you need to refresh Reader views every so often; this is expensive (though usually much less so with the new reopen method).

IndexAccessor enforces the rules and controls Reader refreshing. Instead of worrying about caching or index interaction rules, you just ask for your Reader/Writer, use it to search or add a doc, and then return it. The rest is taken care of for you.

This is done by keeping a cached Writer and Searcher(s) that all threads share. References to the Searchers are counted so that after a Writer is returned (and no other thread has a reference to the Writer), IndexAccessor waits for all of the current Searchers to come back and then reopens their Readers.

In this regard, you get a similar setup to what Solr might give: from any thread you just add docs and run searches -- you don't have to worry about refreshing Readers or sharing Writers/Readers or one thread deleting with a Reader while another thread tries to write with a Writer.

This setup allows you to do other cool things, like warm Searchers before putting them into action. Thats what the code I am posting soon is be capable of - when the Readers are reopened, search requests will still be handled by the old Readers while the new Searchers run a sample query with optional sort fields. This will make sure the Reader is open and its sort caches are loaded before the first thread tries to use it. Much faster response to applications.

You must open a new Reader or reopen a Reader to see recently added docs...IndexAccessor provides no real way around that. But it does make the reopening much easier -- and your application that just wants to add docs and search at will from multiple threads, won't have to worry about it.

You can bail out here, or if you want further clarification I will include an alternate attempt at what IndexAccessor is below.

- Mark

----------------------------------------------------------------------------------------------------
When accessing a Lucene index from multiple threads, there are a variety of issues that you must address.

1. The Readers/Writer should be shared across threads.
2. Readers must periodically be refreshed, either be creating new instances or using the new reopen method. 3. A Reader that writes needs to be properly coordinated with a Writer eg they cannot be used at the same time.

IndexAccessor addresses each of these issues.

How it works:

A single Writer is shared among threads that try to concurrently retrieve and use a Writer. Once all of these threads release their reference
to the Writer, it is closed and upon the next request a new one is created.

A single Searcher for each Similarity is also shared across threads. Upon first request, a new Searcher is created. This Searcher is then returned upon every request. A count of every Searcher reference retrieved is maintained.

When all references to a Writer are released, the Writer is closed and after waiting for all of the Searchers to be returned, the Searchers are reopened. Without warming enabled, new requests for Searchers/Readers must wait for this reopen to complete. If warming is enabled, the old Searchers/Readers continue handling Searcher requests until the Readers have been reopened and any requested sort caches have been loaded.

If you ask for a writing Reader, you will not get it until a Writer is released and vice versa.

The result is that you can freely use Writers/Readers/Searchers from any thread without considering thread interactions. ***

If you want to add docs, just ask for a Writer, add the docs, and release the Writer. If you want to search, get a Searcher, search, and release the Searcher. You don't have to worry about reopening Readers or coordinating access.


***
You still do have to consider things like hogging the Writer/Readers - if you don't occasionally release them, things will not stay very interactive. The best method is to just get the object, use it, and then return it in a finally block. Batch load multiple docs, but if your just randomly adding a doc, get the Writer, add it, and then release the Writer in a finally block. If you are batch loading a million docs and you want to be able to see them as they are added: get the writer and add 10,000 docs (or something), release the Writer, get the Writer and add 10,000 docs, etc.

Cam Bazz wrote:
Hello Mark,

I have been reading the code - and honestly I have not understood how it
works. I was hoping that this was a solution to the case when you are adding
documents - in a multithreaded way, it allows other non-writer threads to be
able to see documents added without refreshing the indexsearcher - by using
some caching mechanism.

Could you elaborate what IndexAccessor does and how it does it a little bit
more?

Best Regards,
-C.B.

On Feb 4, 2008 3:06 PM, Mark Miller <[EMAIL PROTECTED]> wrote:

IndexAccessor-1.26.2008.zip is the latest one. I will be dating a zip from
now on.

I hope to post new code with the warming either tonight or tomorrow night.
I would be ecstatic to have some help vetting that.

Also, I am thinking of making a change so that when you release the Writer
the thread that releases does not block until reopen. I think the original
author did this so that if you add a doc with a thread and then immediately
search from the same thread, you are guaranteed to find the doc. However,
this gaurentee did not hold -- if another thread had a reference to the
Writer and a new thread grabbed a Writer and then quicly released before the
first thread, you will have added a doc but it will not be visible until the
first thread releases its reference to the Writer...since the concept is not
enforced anyway, you might as well not block for the final thread that
releases the Writer either. Instead I will grab a thread from a thread pool
to do the reopening with that thread, and return right after closing the
Writer. The result is that you cannot add a doc and search and expect to
find it without waiting a second or too. But this way things will be
consistent, and an app that adds docs will be a bit more responsive....eg it
wont hang as Readers are being reopened.

I also have to bring the AccessProvider classes back. No easy way to use
your own custom Readers without it...I shouldn't have stripped it out.

- Mark



Cam Bazz wrote:
Hello,

Regarding https://issues.apache.org/jira/browse/LUCENE-1026 , this seems
very interesting. I have read the discussion on the page, but I could
not
figure out which set of files is the latest.
Is it the IndexAccessor-1.26.2008.zip file?

I will read through the code, make my own tests, and send some feedback.

Best.
-C.B.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to