Hi Aaron T., No, actually we haven't, but this sounds like a good suggestion. I can definitely try THIS before jumping into other things such as enabling row cache etc. Thanks!
-- Y. On Wed, May 16, 2012 at 9:38 PM, Aaron Turner <synfina...@gmail.com> wrote: > On Wed, May 16, 2012 at 12:59 PM, Yiming Sun <yiming....@gmail.com> wrote: > > Hello, > > > > I asked the question as a follow-up under a different thread, so I > figure I > > should ask here instead in case the other one gets buried, and besides, I > > have a little more information. > > > > "We find the lack of performance disturbing" as we are only able to get > > about 3-4MB/sec read performance out of Cassandra. > > > > We are using cassandra as the backend for an IR repository of digital > texts. > > It is a read-mostly repository with occasional writes. Each row > represents > > a book volume, and each column of a row represents a page of the volume. > > Granted the data size is small -- the average size of a column text is > > 2-3KB, and each row has about 250 columns (varies quite a bit from one > > volume to another). > > > > Currently we are running a 3-node cluster, and will soon be upgraded to a > > 6-node setup. Each node is a VM with 4 cores and 16GB of memory. All > VMs > > use SAN as disk storage. > > > > To retrieve a volume, a slice query is used via Hector that specifies the > > row key (the volume), and a list of column keys (pages), and the > consistency > > level is set to ONE. It is typical to retrieve multiple volumes per > > request. > > > > The read rate that I have been seeing is about 3-4 MB/sec, and that is > > reading the raw bytes... using string serializer the rate is even lower, > > about 2.2MB/sec. > > > > The server log shows the GC ParNew frequently gets longer than 200ms, > often > > in the range of 4-5seconds. But nowhere near 15 seconds (which is an > > indication that JVM heap is being swapped out). > > > > Currently we have not added JNA. From a blog post, it seems JNA is able > to > > increase the performance by 13%, and we are hoping to increase the > > performance by something more like 1300% (3-4 MB/sec is just disturbingly > > low). And we are hesitant to disable swap entirely since one of the > nodes > > is running a couple other services > > > > Do you have any suggestions on how we may boost the performance? Thanks! > > Have you tried using more threads on the client side? Generally > speaking, when I need faster read/write performance I look for ways to > parallelize my requests and it scales pretty much linearly. > > > -- > Aaron Turner > http://synfin.net/ Twitter: @synfinatic > http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & > Windows > Those who would give up essential Liberty, to purchase a little temporary > Safety, deserve neither Liberty nor Safety. > -- Benjamin Franklin > "carpe diem quam minimum credula postero" >