Re: Concurrent Indexing + Searching

Mark Miller Tue, 05 Feb 2008 04:23:42 -0800


ajay_garg wrote:

Thanks Mark.

Ok, I got your point. So it happens like this :

a) If it is me, who is re-opening an IndxReader, at any time, but
"manually-programmatically". That is, I don't want
a-sort-of-automatic-reopening-of-IndexWriter, then I am fine.

Sure...your kind of doing what IndexAccessor does...choosing when toreopen the views using some metric. Just follow Lucene access rules (nowriting ops with a Reader while another thread uses a Writer etc.) Also,you want to share Searchers and Writers across threads.

b) If I do wish this automatic-reopening of index (using IndexAccessor),
then I am forced to rely on all the indexer threads releasing the reference
to IndexWriter, which by the way, as a developer, can never be sure of (that
is, I don't have any control, as to when exactly all the threads leave the
reference ).

You have fairly decent control...its all running on the server. A clientwould be making a call to the server, which would run the code. Tostart, release in a finally block, and second, avoid any infinite loopsor what not, and you have a fair amount of control here. As long as yourcomputer can compute and make forward progress, even if any exception isthrown, things will get released. One year plus at many sites and I havenever seen anything not get released unless the whole server went down,in which case I cannot do anything anyway. Now if your constantlybombarded with write operations that just never let up...sure - but yourstill the code behind the curtain...you can write some code that looksfor such a bombardment. I think the control is pretty good. I guess thepoint is that the client is not whats using IndexAccessor...its making arequest to the server which then uses IndexAccessor.

Will be obliged if you could give a confirmation to my understanding.

Thanks
Ajay Garg

markrmiller wrote:
You are right that if auto-commit=true and a user reopens anIndexReader, the docs will absolutely be visible as they are flushed. Ithink the part you are missing is that you need to be cooperating withthe IndexAccessor: a user should not be reopening an IndexReader. Thewhole point of IndexAccessor is to coordinate these things...when aWriter is released, we know the index has changed, so that is when theIndexReaders are reopened for you. Because the IndexWriter is cached andshared by Threads, a thread might release the Writer while another isstill using it...that is why things are not reopened and the Writer notclosed until the last thread releases its reference to it. Essentially,IndexAccessor control visibility by controlling how current the view ofthe Readers is, by controlling their reopening -- a user should agreenot to reopen -- just like he must agree not to use a ReadingWriter todelete.
If you want to just set an IndexWriter to indexing for eternity and thenhave some Readers that you occasionally reopen, you don't needIndexAccessor. Its purpose is to coordinate ReaderReaders,WritingReaders, Searchers, and Writers for you. You are proposing tocoordinate them yourself. IndexAccess reopens Readers for you after aWriter has been used, and enforces Lucene requirements, like aWritingReader cannot be used at the same time as a Writer...etc.
Technically, IndexAccessor could reopen the readers every 2seconds...and then you would see your changes...instead it only tries toreopen them if a change has been made to the index...and it does notwant to get greedy if a Writer is batch loading, so it waits for you torelease the Writer. You can control how often the 'view' is updated byreleasing the Writer more often -- say every 50 docs. Write 50 docs,release, get, write 50 docs.
- Mark

ajay_garg wrote:
@Mark.

I am sorry, but I need a bit more of explanation. So you mean to say ::

"If auto-commit is false, then of course, docs will not be visible in the
index, until all the threads release themselves out of a particular
IndexWriter instance, and close() the IndexWriter instance.
If auto-commit is true, even then the above holds true. In particular,
let's
say iI need an applicationwith the following requirements ::
a) There are multiple indexer threads indexing on a SINGLE indexwriter
instance with auto-commit true
b) Each thread 'flushes' according to a pre-defined criteria at some
point
of time.
c) The index should be updated immediately, that is, if any user re-opens
the IndexSearcher, then thedocuments added till-that-snapshot-of-index must be visible. Note
that
the IndexWriter instance hasn'tbeen closed as yet, the indexer threads will be indexing till
eternity,
so that IndexWriter instance willnever be closed.
So, you presume that building an application with the above requirements
is
impossible, even with auto-commit set to true. "

( If I sound ambiguous at any point, kindly forgive me for my lack of
language skills. I will try to explain better, if need arises ).

Looking forward to a reply
Ajay Garg

markrmiller wrote:
You are correct that autocommit=false means that docs will be in theindex before the last thread releases its concurrent hold on a Writer,*but because IndexAccessor controls* *when the IndexSearchers arereopened*, those docs will still not be visible until the last threadholding a Writer releases it...that is when the reopening of Searchersoccurs as well as when the Writer is closed.
- Mark

ajay_garg wrote:
Hi. Sorry if I seem a stranger in this thread, but there is something
that I
can't resist clearing myself on.

Mark, you say that the additional documents added to a index, won't
show
up
until the # of threads accessing the index hits 0; and subsequently the
indexwriter instance is closed.

But I suppose that the autocommit=true, asserts that all flushed
(Added)
documents are immediately committed ( and hence visible ) in the index,
and
no explicit cclosing ( releasiing ) of the Indexwriter instance is
required.
( Of course, re-opening an IndexSearcher instance is required ).

Am I being dumb ?

Looking eagerly for you to shed some light on my doubt.

Thanks
Ajay Garg


codetester wrote:
Hi All,

A newbie out here.... I am using lucene 2.3.0. I need to use lucene to
perform live searching and indexing. To achieve that, I tried the
following

FSDirectory directory = FSDirectory.getDirectory(location);
IndexReader reader = IndexReader.open(directory );
IndexWriter writer = new IndexWriter(directory , new SimpleAnalyzer(),
true); // <- I want to recreate the index every time
IndexSearcher searcher = new IndexSearcher( reader );

For Searching, I have the following code
QueryParser queryParser = new QueryParser("xyz", new
StandardAnalyzer());
Hits hits = searcher .search(queryParser.parse(displayName + "*"));

And for adding records, I have the following code
 // Create doc object
 writer.addDocument(doc);

 IndexReader newIndexReader = reader.reopen() ;
 if ( newIndexReader != reader ) {
       reader.close() ;
 }
 reader = newIndexReader ;
 searcher.close() ;
 searcher = new IndexSearcher(reader );
So the issues that I face are
1) The addition of new record is not reflected in the search ( even
though
I have reinited IndexSearcher )

2) Obviously, the add record code is not thread safe. I am trying to
close
and update the reference to IndexSearcher object. I could add a sync
block, but the bigger question would be that what is the ideal way to
achieve this case where I need to add and search record real-time ?
Thanks !
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Concurrent Indexing + Searching

Reply via email to