Hello,

I asked the question as a follow-up under a different thread, so I figure I
should ask here instead in case the other one gets buried, and besides, I
have a little more information.

"We find the lack of performance disturbing" as we are only able to get
about 3-4MB/sec read performance out of Cassandra.

We are using cassandra as the backend for an IR repository of digital
texts. It is a read-mostly repository with occasional writes.  Each row
represents a book volume, and each column of a row represents a page of the
volume.  Granted the data size is small -- the average size of a column
text is 2-3KB, and each row has about 250 columns (varies quite a bit from
one volume to another).

Currently we are running a 3-node cluster, and will soon be upgraded to a
6-node setup.  Each node is a VM with 4 cores and 16GB of memory.  All VMs
use SAN as disk storage.

To retrieve a volume, a slice query is used via Hector that specifies the
row key (the volume), and a list of column keys (pages), and the
consistency level is set to ONE.  It is typical to retrieve multiple
volumes per request.

The read rate that I have been seeing is about 3-4 MB/sec, and that is
reading the raw bytes... using string serializer the rate is even lower,
about 2.2MB/sec.

The server log shows the GC ParNew frequently gets longer than 200ms, often
in the range of 4-5seconds.  But nowhere near 15 seconds (which is an
indication that JVM heap is being swapped out).

Currently we have not added JNA.  From a blog post, it seems JNA is able to
increase the performance by 13%, and we are hoping to increase the
performance by something more like 1300% (3-4 MB/sec is just disturbingly
low).  And we are hesitant to disable swap entirely since one of the nodes
is running a couple other services

Do you have any suggestions on how we may boost the performance?  Thanks!

-- Y.

Reply via email to