On Fri, Jan 21, 2011 at 12:07 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > On Fri, Jan 21, 2011 at 2:19 AM, Mick Semb Wever <m...@apache.org> wrote: >> >>> Of course with a SAN you'd want RF=1 since it's replicating >>> internally. >> >> Isn't this the same case for raid-5 as well? > > No, because the replication is (mainly) to protect you from machine > failures; if the SAN is a SPOF then putting more replicas on it > doesn't help. > >> And we want RF=2 if we need to keep reading while doing rolling >> restarts? > > Yes. > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >
If you are using cassandra with a SAN RF=1 makes sense because we are making the assumption the san is already replicating your data. RF2 makes good sense to be not effected by outages. Another alternative is something like linux-HA and manage each cassandra instance as a resource. This way if a head goes down another node linux ha would detect the failure and bring up that instance on another physical piece of hardware. Using LinuxHA+SAN+Cassandra would actually bring Cassandra closer to the hbase model which you have a distributed file system but the front end Cassandra acts like a region server.