> It does appear that I am IO bound. Disks show about 90% util. Well, also pay attention to the average queue size column. If there are constantly more requests waiting to be serviced than you have platters, you're almost certainly I/O bound. The utilization number can be a bit flaky sometimes, although 90% doesn't a bit too far below 100% to be attributed to inexactness in the kernel's measurements.
> What are my options then? Is cassandra not suitable for columns of this > size? It depends. Cassandra is a log-structured database, meaning that all writes are sequential and you are going to be doing background compactions that imply re-reading and re-writing data. This optimization makes sense in particular for smaller values where the cost of doing sequential I/O is a lot less than seek-bound I/O, but it is less relevant for large values. The main "cost" of background compactions is the extra reading and writing of data that happens. If your workload is full of huge values, then the only significant cost *is* the sequential I/O. So in that sense, background compaction becomes more expensive relative to the theoretical optimum than it does for small values. It depends on details of the access pattern, but I'd say that (1) for very large values, Cassandra's advantages become less pronounced in terms of local storage on each nodes, although the clustering capabilities remain relevant, and that (2) depending on the details of the use-case, Cassandra *may* not be terribly suitable. > I am running stress code from hector which doesn't sound like give ability > to do operations per sec. I am insert 1M rows and then reading. Have not > been able to do in parallel because of io issues. stress.py doesn't support any throttling, except very very indirectly by limiting the total number of threads. In a situation like this I think you need to look at what your target traffic is going to be like. Throwing un-throttled traffic at the cluster like stress.py does is not indicative of normal traffic patterns. For typical use-cases with small columns this is still handled well, but when you are both unthrottled *and* are throwing huge columns at it, there is no expectation that this is handled very well. So, for large values like this I recommend figuring out what the actual expected sustained amount of writes is, and then benchmark that. Using stress.py out-of-the-box is not giving you much relevant information, other than the known fact that throwing huge-column traffic at Cassandra without throttling is not handled very gracefully. But that said, when using un-throttled benchmarking like stress.py - at any time where you're throwing more traffic at the cluster than it can handle, is it *fully expected* that you will see the 'active' stages be saturated and a build-up of 'pending' operations. This is the expected results of submitting a greater number of requests per second than can be processed - in pretty much any system. You queue up to some degree, and eventually you start having to drop or fail requests. The unique thing about large columns is that it becomes a lot easier to saturate a node with a single (or few) stress.py clients than it is when stressing with a more normal type of load. The extra cost of dealing with large values is higher in Cassandra than it is in stress.py; so suddenly a single stress.py can easily saturate lots of nodes simply because you can so trivially be writing data at very high throughput by upping the column sizes -- / Peter Schuller