If you do this on windows, you might be able to replicate the indexes using DFS. On linux you can probably use rsync to keep the different servers up to date.
If the size of the index is an issue, lustre could be used to have one volume that's spread over many servers. Performance is supposed to be good with lustre as well. If you want to speed up individual queries when searching a large index, you can probably split up the index in some way among the servers, query them all at the same time and then aggregate the results. This is just an idea, but I believe it was mentioned in "lucene in action". Russ Sent wirelessly via BlackBerry from T-Mobile. -----Original Message----- From: "Peter W." <[EMAIL PROTECTED]> Date: Thu, 4 Jan 2007 14:02:00 To:java-user@lucene.apache.org Subject: Re: lucene scalability questions Mark, My understanding of Lucene is limited, but the issues seem similar to web server farms in that it comes down to linear scalability by adding more boxes. This means separate machines with their own indexes. Shared filesystems such as NFS work well in smaller environments but experience problems with heavy load (lost mounts req. reboots). There's no mysql-like 'replication' with masters using binary files to update slaves. However, since the index is file based, you can close Indexwriters and make hot copies or perform backups for redundancy. If you know XML, use Solr to post and retrieve documents to and from your various Lucene indexes. It hides the complexity of remote object brokering such as RMI. Solr also allows you to get result sets using JSON so you could provide distributed Lucene results to browsers as a .js widget. While not reflecting the latest 2.0 version release the Lucene in Action book provides good background on combining separate indexes. Regards, Peter W. On Jan 4, 2007, at 7:51 AM, Mark Mei wrote: > So this question has two parts: > > 1. How does Lucene scale, exactly? Do we distribute the index to > multiple > servers somehow? Or is it one index, sitting on some sort of a shared > filesystem, shared by all Lucene servers? If it's the latter, the > bottleneck > will be I/O ... anyway, elaborate on scalability please, and how > you set it > up > > 2. High availability. How would one go about making Lucene redundant? --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]