If you have RF=1, taking one node down is going to cause 25% of your
data to be unavailable.  If you want to tolerate a machines going down
you need to have at least RF=2, if you want to use quorum and have a
machine go down, you need at least RF=3.

On Tue, 2011-08-02 at 16:22 +0200, Patrik Modesto wrote:
> Hi all!
> 
> I've a test cluster of 4 nodes running cassandra 0.7.8, with one
> keyspace with RF=1, each node owns 25% of the data. As long as all
> nodes are alive, there is no problem, but when I shut down just one
> node I get UnavailableException in my application. cassandra-cli
> returns "null" and hadoop mapreduce task won't start at all.
> 
> Loosing one node is not a problem for me, the data are not important,
> loosing even half the cluster is not a problem as long as everything
> runs just as with a full cluster.
> 
> The error from hadoop is like this:
> Exception in thread "main" java.io.IOException: Could not get input splits
>         at 
> org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:120)
>         at 
> cz.xxx.yyy.zzz.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:111)
>         at 
> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:944)
>         at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:961)
>         at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>         at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
>         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
>         at cz.xxx.yyy.zzz.ContextIndexer.run(ContextIndexer.java:663)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at cz.xxx.yyy.zzz.ContextIndexer.main(ContextIndexer.java:94)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Caused by: java.util.concurrent.ExecutionException:
> java.io.IOException: failed connecting to all endpoints 10.0.18.87
>         at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>         at 
> org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:116)
>         ... 20 more
> Caused by: java.io.IOException: failed connecting to all endpoints 10.0.18.87
>         at 
> org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSubSplits(ColumnFamilyInputFormat.java:197)
>         at 
> org.apache.cassandra.hadoop.ColumnFamilyInputFormat.access$200(ColumnFamilyInputFormat.java:67)
>         at 
> org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:153)
>         at 
> org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:138)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)

Reply via email to