Hi Yiming,
Cassandra is optimized for write-heavy environments.
If you have a read-heavy application, you shouldn't be running your
reads through Cassandra.
On the bright side - Cassandra read throughput will remain consistent,
regardless of your volume. But you are going to have to "wrap" your
reads with memcache (or redis), so that the bulk of your reads can be
served from memory.
Thanks,
Mike Peters
On 5/16/2012 3:59 PM, Yiming Sun wrote:
Hello,
I asked the question as a follow-up under a different thread, so I
figure I should ask here instead in case the other one gets buried,
and besides, I have a little more information.
"We find the lack of performance disturbing" as we are only able to
get about 3-4MB/sec read performance out of Cassandra.
We are using cassandra as the backend for an IR repository of digital
texts. It is a read-mostly repository with occasional writes. Each
row represents a book volume, and each column of a row represents a
page of the volume. Granted the data size is small -- the average
size of a column text is 2-3KB, and each row has about 250 columns
(varies quite a bit from one volume to another).
Currently we are running a 3-node cluster, and will soon be upgraded
to a 6-node setup. Each node is a VM with 4 cores and 16GB of memory.
All VMs use SAN as disk storage.
To retrieve a volume, a slice query is used via Hector that specifies
the row key (the volume), and a list of column keys (pages), and the
consistency level is set to ONE. It is typical to retrieve multiple
volumes per request.
The read rate that I have been seeing is about 3-4 MB/sec, and that is
reading the raw bytes... using string serializer the rate is even
lower, about 2.2MB/sec.
The server log shows the GC ParNew frequently gets longer than 200ms,
often in the range of 4-5seconds. But nowhere near 15 seconds (which
is an indication that JVM heap is being swapped out).
Currently we have not added JNA. From a blog post, it seems JNA is
able to increase the performance by 13%, and we are hoping to increase
the performance by something more like 1300% (3-4 MB/sec is just
disturbingly low). And we are hesitant to disable swap entirely since
one of the nodes is running a couple other services
Do you have any suggestions on how we may boost the performance? Thanks!
-- Y.