Indexing-Searching Design

Anand Kishore Tue, 04 Oct 2005 10:05:59 -0700

Hi all,

Having read the mail in the mailing list archive about Best
Indexing-Searching Practices I have come up with the following architecture
for my application. Kindly evaluate and comment regarding the same.


Figure:

http://www.flickr.com/photos/[EMAIL PROTECTED]/49301053/

Explanation:

The primary indexer (daemon) recieves the documents to be indexed. It
dispatches the documents to one of the secondary indexer nodes (via load
balancing). These indexing nodes index the documents in the RAMDirectory,
periodically writing it to a local index in the filesystem.

A cron process running on the central server (which contains the main index)
periodically checks for any new/updated indexes (small in size) on the
secondary nodes. It copies these new (small) indexes to the central server
(based on 'push changes onto main index'). An optimizer process running on
the central server periodically merges/optimizes the main index with the
smaller newer indexes. It also creates a checkpoint of the consistent index
everytime it performs optimization (the index.DATE approach).

The 'updaters' (cron processes on searcher nodes) periodically copy new
checkpoints via rsync onto their local system and create symbolic links to
them (same as proposed and used by Doug for Technorati).


--
- Andy

Indexing-Searching Design

Reply via email to