I have a client with 700 million doc index running on a SAN. The performance is 
v good but this obviously depends on your choice of  SAN config. In this 
environment I have multiple search servers happily hitting the same physical 
lucene index on the SAN. The servers negotiate with each other via zookeeper to 
elect a server with write responsibilities.

One advantage of this approach over local disks is that index replication is 
relatively easy (replication is effectively being handled by SAN's internal 
striping mechanisms). Index replica synching as seen in Solr relies on careful 
shuffling of (hopefully) small segments of lucene indexes between servers 
across the network with mechanisms for dealing with server/network failures. 
Alternative local index synching approaches rely on replica servers 
independently building their own indexes from source data.
Both of these replication strategies are arguably more fallible than a 
hardware-based striping mechanism in a sophisticated SAN.  
  
However the disadvantage with a SAN is the potential for the SAN to become a 
bottleneck or single point of failure (hardware or app-level index corruption). 
I'm sure high-end SAN vendors can talk up their ability to hot deploy extra 
disks to boost performance if needed and sophisticated failover mechanisms but 
these get expensive. 

If your client has invested significantly in a SAN and is pushing you to use it 
my experience is that this can work but benchmarking vs local disk is key.  

Cheers
Mark


On 26 Sep 2009, at 08:36, Matthias Hess <mat.h...@bluewin.ch> wrote:

Hello

We are currently implementing our first Lucene project. We are building an 
application which will index public Records on the internet, about 200'000 
documents, each document is about 150 k in size. Our customer would like to 
store the Lucene index on a SAN disk.
We recommended the use of high speed local disks, but our customer would prefer 
SAN for their better managability.

Does anybody have good or bad experiences with SAN disks?
Are there parameters like #read operations, data read rate or whatever, which 
must be met to have a performance which rivals a good "local disk"?

Thanks for sharing your ideas and opinions!

Kind Regards
Matthias Hess

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to