Re: Index Replication / Clustering

Stephane Bailliez Sun, 26 Jun 2005 01:49:17 -0700

Nader Henein wrote:

Our setup is quite similar to yours, but in all honesty, you will needto do some for of batching on your updates simply because, you don'twant to keep the Index Writter open all the time.

For now, the index writer is closed after each added document. It doesnot seem to have such a major overhead compared to keep it open, at mostoverhead is 2x in my tests, which is acceptable for now and in par withother commercial search engines they have been using. My constraint isbasically that the mergeFactor must be 1, but I think honestly that itwill need to be relaxed when the document rate will increase.


There were no tuning yet.

I have also a quite specific document lifecycle. Incoming documents are5-10KB xml where I'm only extracting 0.5-1KB data to be indexed. Thesedocuments NEVER change. They are not updated, nor deleted.

They are only deleted for archiving purposes because we keep only thelast 6-months of data.

As for clustering, we went through three iterations, that keep x indexesparallelized on x servers all of this with fail over and indexindependent synchronization with your persistent store. There was alittle discussion about this a few weeks back, and I mentioned that yourbiggest pain will be maintaining the integrity of parallel indexes thatare updated/deleted autonomously (atomic updates and deletes) but thereare ways of running iterative checks to make sure that your indeciesstay clean.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Index Replication / Clustering

Reply via email to