Hello Mark, Thank you for your lengthy and valuable clarification. I have the case - before adding to the index, i must check if a document exist with the same key (actually, double key) - or before deleting a document - I must ensure it exists in the index.
Currently I am doing it with my custom caching routine. It works quite well upto 32M documents. but after that something happens and it really slows down. I will experiment with your implementation, as soon as I can. It is very cool by the way. Will it be included in the next release? Best, -C.B. On Feb 4, 2008 7:15 PM, Mark Miller <[EMAIL PROTECTED]> wrote: > The purpose of IndexAccessor is to coordinate Readers/Writers for a > Lucene index. Readers and Writers in Lucene are multi-threaded in that > multiple threads may use them at the same time, but they must/should be > shared and there are special rules (You cannot delete with a Reader > while a Writer is working on the index). Also, you need to refresh > Reader views every so often; this is expensive (though usually much less > so with the new reopen method). > > IndexAccessor enforces the rules and controls Reader refreshing. Instead > of worrying about caching or index interaction rules, you just ask for > your Reader/Writer, use it to search or add a doc, and then return it. > The rest is taken care of for you. > > This is done by keeping a cached Writer and Searcher(s) that all threads > share. References to the Searchers are counted so that after a Writer is > returned (and no other thread has a reference to the Writer), > IndexAccessor waits for all of the current Searchers to come back and > then reopens their Readers. > > In this regard, you get a similar setup to what Solr might give: from > any thread you just add docs and run searches -- you don't have to worry > about refreshing Readers or sharing Writers/Readers or one thread > deleting with a Reader while another thread tries to write with a Writer. > > This setup allows you to do other cool things, like warm Searchers > before putting them into action. Thats what the code I am posting soon > is be capable of - when the Readers are reopened, search requests will > still be handled by the old Readers while the new Searchers run a sample > query with optional sort fields. This will make sure the Reader is open > and its sort caches are loaded before the first thread tries to use it. > Much faster response to applications. > > You must open a new Reader or reopen a Reader to see recently added > docs...IndexAccessor provides no real way around that. But it does make > the reopening much easier -- and your application that just wants to add > docs and search at will from multiple threads, won't have to worry about > it. > > You can bail out here, or if you want further clarification I will > include an alternate attempt at what IndexAccessor is below. > > - Mark > > > ---------------------------------------------------------------------------------------------------- > When accessing a Lucene index from multiple threads, there are a variety > of issues that you must address. > > 1. The Readers/Writer should be shared across threads. > 2. Readers must periodically be refreshed, either be creating new > instances or using the new reopen method. > 3. A Reader that writes needs to be properly coordinated with a Writer > eg they cannot be used at the same time. > > IndexAccessor addresses each of these issues. > > How it works: > > A single Writer is shared among threads that try to concurrently > retrieve and use a Writer. Once all of these threads release their > reference > to the Writer, it is closed and upon the next request a new one is > created. > > A single Searcher for each Similarity is also shared across threads. > Upon first request, a new Searcher is created. This Searcher is then > returned > upon every request. A count of every Searcher reference retrieved is > maintained. > > When all references to a Writer are released, the Writer is closed and > after waiting for all of the Searchers to be returned, the Searchers are > reopened. Without warming enabled, new requests for Searchers/Readers > must wait for this reopen to complete. If warming is enabled, the old > Searchers/Readers continue handling Searcher requests until the Readers > have been reopened and any requested sort caches have been loaded. > > If you ask for a writing Reader, you will not get it until a Writer is > released and vice versa. > > The result is that you can freely use Writers/Readers/Searchers from any > thread without considering thread interactions. *** > > If you want to add docs, just ask for a Writer, add the docs, and > release the Writer. If you want to search, get a Searcher, search, > and release the Searcher. You don't have to worry about reopening > Readers or coordinating access. > > > *** > You still do have to consider things like hogging the Writer/Readers - > if you don't occasionally release them, things will not stay very > interactive. > The best method is to just get the object, use it, and then return it in > a finally block. Batch load multiple docs, but if your just randomly > adding > a doc, get the Writer, add it, and then release the Writer in a finally > block. If you are batch loading a million docs and you want to be able > to see them > as they are added: get the writer and add 10,000 docs (or something), > release the Writer, get the Writer and add 10,000 docs, etc. > > Cam Bazz wrote: > > Hello Mark, > > > > I have been reading the code - and honestly I have not understood how it > > works. I was hoping that this was a solution to the case when you are > adding > > documents - in a multithreaded way, it allows other non-writer threads > to be > > able to see documents added without refreshing the indexsearcher - by > using > > some caching mechanism. > > > > Could you elaborate what IndexAccessor does and how it does it a little > bit > > more? > > > > Best Regards, > > -C.B. > > > > On Feb 4, 2008 3:06 PM, Mark Miller <[EMAIL PROTECTED]> wrote: > > > > > >> IndexAccessor-1.26.2008.zip is the latest one. I will be dating a zip > from > >> now on. > >> > >> I hope to post new code with the warming either tonight or tomorrow > night. > >> I would be ecstatic to have some help vetting that. > >> > >> Also, I am thinking of making a change so that when you release the > Writer > >> the thread that releases does not block until reopen. I think the > original > >> author did this so that if you add a doc with a thread and then > immediately > >> search from the same thread, you are guaranteed to find the doc. > However, > >> this gaurentee did not hold -- if another thread had a reference to the > >> Writer and a new thread grabbed a Writer and then quicly released > before the > >> first thread, you will have added a doc but it will not be visible > until the > >> first thread releases its reference to the Writer...since the concept > is not > >> enforced anyway, you might as well not block for the final thread that > >> releases the Writer either. Instead I will grab a thread from a thread > pool > >> to do the reopening with that thread, and return right after closing > the > >> Writer. The result is that you cannot add a doc and search and expect > to > >> find it without waiting a second or too. But this way things will be > >> consistent, and an app that adds docs will be a bit more > responsive....eg it > >> wont hang as Readers are being reopened. > >> > >> I also have to bring the AccessProvider classes back. No easy way to > use > >> your own custom Readers without it...I shouldn't have stripped it out. > >> > >> - Mark > >> > >> > >> > >> Cam Bazz wrote: > >> > >>> Hello, > >>> > >>> Regarding https://issues.apache.org/jira/browse/LUCENE-1026 , this > seems > >>> very interesting. I have read the discussion on the page, but I could > >>> > >> not > >> > >>> figure out which set of files is the latest. > >>> Is it the IndexAccessor-1.26.2008.zip file? > >>> > >>> I will read through the code, make my own tests, and send some > feedback. > >>> > >>> Best. > >>> -C.B. > >>> > >>> > >>> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > >> > >> > >> > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >