Hi Peter, Thanks for your detailed query...
I have 8 m/c cluster. KVSHIGH1,2,3,4 and KVSLOW1,2,3,4. As the name suggests KVSLOWs have low diskspace ~ 350GB Whereas KVSHIGHs have 1.5 terabytes. Yet my nodetool shows the following: 192.168.202.202Down 319.94 GB 7200044730783885730400843868815072654 |<--| 192.168.202.4 Up 382.39 GB 23719654286404067863958492664769598669 | ^ 192.168.202.2 Up 106.81 GB 36701505058375526444137310055285336988 v | 192.168.202.3 Up 149.81 GB 65098486053779167479528707238121707074 | ^ 192.168.202.201Up 154.72 GB 79420606800360567885560534277526521273 v | 192.168.202.204Up 72.91 GB 85219217446418416293334453572116009608 | ^ 192.168.202.1 Up 29.78 GB 87632302962564279114105239858760976120 v | 192.168.202.203Up 9.35 GB 87790520647700936489181912967436646309 |-->| As you can see one of our KVSLOW box is already down. Its 100% full. Whereas boxes having 1.5 terabytes have only 29.78 GB (192.168.202.1 )! I'm using RandomPartitioner. When I run the client program the Cassandra Daemon takes around 85-130% CPU. Regards, Rana On Mon, Sep 27, 2010 at 2:31 PM, Peter Schuller <peter.schul...@infidyne.com > wrote: > > How can I handle this kind of situation? > > In terms of surviving the problem, a re-try on the client side might > help assuming the problem is temporary. > > However, certainly the fact that you're seeing an issue to begin with > is interesting, and the way to avoid it would depend on what the > problem is. My understanding is that the UnavailableException > indicates that the node you are talking to was unable to read > form/write to a sufficient number of nodes to satisfy your consistency > level. Presumably either because individual requests failed to return > in time, or because the node considers other nodes to be flat out > down. > > Can you correlate these issues with server-side activity on the nodes, > such as background compaction, commitlog rotation or memtable > flushing? Do you see your nodes saying that other nodes in the cluster > are "DOWN" and "UP" (flapping)? > > How large is the data set in total (in terms of sstable size on disk), > and how much memory do you have in your machines (going to page > cache)? > > Have you observed the behavior of your nodes during compaction; in > particular whether compaction is CPU bound or I/O bound? (That would > tend to depend on data; generally the larger the individual values the > more disk bound you'd tend to be.) > > Just trying to zero in on what the likely root cause is in this case. > > -- > / Peter Schuller >