Hi all, Having read the mail in the mailing list archive about Best Indexing-Searching Practices I have come up with the following architecture for my application. Kindly evaluate and comment regarding the same.
Figure: http://www.flickr.com/photos/[EMAIL PROTECTED]/49301053/ Explanation: The primary indexer (daemon) recieves the documents to be indexed. It dispatches the documents to one of the secondary indexer nodes (via load balancing). These indexing nodes index the documents in the RAMDirectory, periodically writing it to a local index in the filesystem. A cron process running on the central server (which contains the main index) periodically checks for any new/updated indexes (small in size) on the secondary nodes. It copies these new (small) indexes to the central server (based on 'push changes onto main index'). An optimizer process running on the central server periodically merges/optimizes the main index with the smaller newer indexes. It also creates a checkpoint of the consistent index everytime it performs optimization (the index.DATE approach). The 'updaters' (cron processes on searcher nodes) periodically copy new checkpoints via rsync onto their local system and create symbolic links to them (same as proposed and used by Doug for Technorati). -- - Andy