I have a client with 700 million doc index running on a SAN. The performance is v good but this obviously depends on your choice of SAN config. In this environment I have multiple search servers happily hitting the same physical lucene index on the SAN. The servers negotiate with each other via zookeeper to elect a server with write responsibilities.
One advantage of this approach over local disks is that index replication is relatively easy (replication is effectively being handled by SAN's internal striping mechanisms). Index replica synching as seen in Solr relies on careful shuffling of (hopefully) small segments of lucene indexes between servers across the network with mechanisms for dealing with server/network failures. Alternative local index synching approaches rely on replica servers independently building their own indexes from source data. Both of these replication strategies are arguably more fallible than a hardware-based striping mechanism in a sophisticated SAN. However the disadvantage with a SAN is the potential for the SAN to become a bottleneck or single point of failure (hardware or app-level index corruption). I'm sure high-end SAN vendors can talk up their ability to hot deploy extra disks to boost performance if needed and sophisticated failover mechanisms but these get expensive. If your client has invested significantly in a SAN and is pushing you to use it my experience is that this can work but benchmarking vs local disk is key. Cheers Mark On 26 Sep 2009, at 08:36, Matthias Hess <mat.h...@bluewin.ch> wrote: Hello We are currently implementing our first Lucene project. We are building an application which will index public Records on the internet, about 200'000 documents, each document is about 150 k in size. Our customer would like to store the Lucene index on a SAN disk. We recommended the use of high speed local disks, but our customer would prefer SAN for their better managability. Does anybody have good or bad experiences with SAN disks? Are there parameters like #read operations, data read rate or whatever, which must be met to have a performance which rivals a good "local disk"? Thanks for sharing your ideas and opinions! Kind Regards Matthias Hess --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org