Hello, I asked the question as a follow-up under a different thread, so I figure I should ask here instead in case the other one gets buried, and besides, I have a little more information.
"We find the lack of performance disturbing" as we are only able to get about 3-4MB/sec read performance out of Cassandra. We are using cassandra as the backend for an IR repository of digital texts. It is a read-mostly repository with occasional writes. Each row represents a book volume, and each column of a row represents a page of the volume. Granted the data size is small -- the average size of a column text is 2-3KB, and each row has about 250 columns (varies quite a bit from one volume to another). Currently we are running a 3-node cluster, and will soon be upgraded to a 6-node setup. Each node is a VM with 4 cores and 16GB of memory. All VMs use SAN as disk storage. To retrieve a volume, a slice query is used via Hector that specifies the row key (the volume), and a list of column keys (pages), and the consistency level is set to ONE. It is typical to retrieve multiple volumes per request. The read rate that I have been seeing is about 3-4 MB/sec, and that is reading the raw bytes... using string serializer the rate is even lower, about 2.2MB/sec. The server log shows the GC ParNew frequently gets longer than 200ms, often in the range of 4-5seconds. But nowhere near 15 seconds (which is an indication that JVM heap is being swapped out). Currently we have not added JNA. From a blog post, it seems JNA is able to increase the performance by 13%, and we are hoping to increase the performance by something more like 1300% (3-4 MB/sec is just disturbingly low). And we are hesitant to disable swap entirely since one of the nodes is running a couple other services Do you have any suggestions on how we may boost the performance? Thanks! -- Y.