> It does appear that I am IO bound. Disks show about 90% util.

Well, also pay attention to the average queue size column. If there
are constantly more requests waiting to be serviced than you have
platters, you're almost certainly I/O bound. The utilization number
can be a bit flaky sometimes, although 90% doesn't a bit too far below
100% to be attributed to inexactness in the kernel's measurements.

> What are my options then? Is cassandra not suitable for columns of this
> size?

It depends. Cassandra is a log-structured database, meaning that all
writes are sequential and you are going to be doing background
compactions that imply re-reading and re-writing data.

This optimization makes sense in particular for smaller values where
the cost of doing sequential I/O is a lot less than seek-bound I/O,
but it is less relevant for large values.

The main "cost" of background compactions is the extra reading and
writing of data that happens. If your workload is full of huge values,
then the only significant cost *is* the sequential I/O. So in that
sense, background compaction becomes more expensive relative to the
theoretical optimum than it does for small values.

It depends on details of the access pattern, but I'd say that (1) for
very large values, Cassandra's advantages become less pronounced in
terms of local storage on each nodes, although the clustering
capabilities remain relevant, and that (2) depending on the details of
the use-case, Cassandra *may* not be terribly suitable.

> I am running stress code from hector which doesn't sound like give ability
> to do operations per sec. I am insert 1M rows and then reading. Have not
> been able to do in parallel because of io issues.

stress.py doesn't support any throttling, except very very indirectly
by limiting the total number of threads.

In a situation like this I think you need to look at what your target
traffic is going to be like. Throwing un-throttled traffic at the
cluster like stress.py does is not indicative of normal traffic
patterns. For typical use-cases with small columns this is still
handled well, but when you are both unthrottled *and* are throwing
huge columns at it, there is no expectation that this is handled very
well.

So, for large values like this I recommend figuring out what the
actual expected sustained amount of writes is, and then benchmark
that. Using stress.py out-of-the-box is not giving you much relevant
information, other than the known fact that throwing huge-column
traffic at Cassandra without throttling is not handled very
gracefully.

But that said, when using un-throttled benchmarking like stress.py -
at any time where you're throwing more traffic at the cluster than it
can handle, is it *fully expected* that you will see the 'active'
stages be saturated and a build-up of 'pending' operations. This is
the expected results of submitting a greater number of requests per
second than can be processed - in pretty much any system. You queue up
to some degree, and eventually you start having to drop or fail
requests.

The unique thing about large columns is that it becomes a lot easier
to saturate a node with a single (or few) stress.py clients than it is
when stressing with a more normal type of load. The extra cost of
dealing with large values is higher in Cassandra than it is in
stress.py; so suddenly a single stress.py can easily saturate lots of
nodes simply because you can so trivially be writing data at very high
throughput by upping the column sizes

-- 
/ Peter Schuller

Reply via email to