Re: how can we get (a lot) more performance from cassandra

aaron morton Sun, 20 May 2012 16:29:53 -0700

I would look into the problems you are having with GC...

> The server log shows the GC ParNew frequently gets longer than 200ms, often 
> in the range of 4-5seconds.  But nowhere near 15 seconds (which is an 
> indication that JVM heap is being swapped out).


Then check the throughput on the san and the steal on the VM's.

Also try to isolate the issue to "it takes this long for a single thread to 
make this call"

In a low write environment reads should be flying along. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/05/2012, at 1:44 PM, Yiming Sun wrote:

> Hi Aaron T.,  No, actually we haven't, but this sounds like a good 
> suggestion.  I can definitely try THIS before jumping into other things such 
> as enabling row cache etc. Thanks!
> 
> -- Y.
> 
> On Wed, May 16, 2012 at 9:38 PM, Aaron Turner <synfina...@gmail.com> wrote:
> On Wed, May 16, 2012 at 12:59 PM, Yiming Sun <yiming....@gmail.com> wrote:
> > Hello,
> >
> > I asked the question as a follow-up under a different thread, so I figure I
> > should ask here instead in case the other one gets buried, and besides, I
> > have a little more information.
> >
> > "We find the lack of performance disturbing" as we are only able to get
> > about 3-4MB/sec read performance out of Cassandra.
> >
> > We are using cassandra as the backend for an IR repository of digital texts.
> > It is a read-mostly repository with occasional writes.  Each row represents
> > a book volume, and each column of a row represents a page of the volume.
> >  Granted the data size is small -- the average size of a column text is
> > 2-3KB, and each row has about 250 columns (varies quite a bit from one
> > volume to another).
> >
> > Currently we are running a 3-node cluster, and will soon be upgraded to a
> > 6-node setup.  Each node is a VM with 4 cores and 16GB of memory.  All VMs
> > use SAN as disk storage.
> >
> > To retrieve a volume, a slice query is used via Hector that specifies the
> > row key (the volume), and a list of column keys (pages), and the consistency
> > level is set to ONE.  It is typical to retrieve multiple volumes per
> > request.
> >
> > The read rate that I have been seeing is about 3-4 MB/sec, and that is
> > reading the raw bytes... using string serializer the rate is even lower,
> > about 2.2MB/sec.
> >
> > The server log shows the GC ParNew frequently gets longer than 200ms, often
> > in the range of 4-5seconds.  But nowhere near 15 seconds (which is an
> > indication that JVM heap is being swapped out).
> >
> > Currently we have not added JNA.  From a blog post, it seems JNA is able to
> > increase the performance by 13%, and we are hoping to increase the
> > performance by something more like 1300% (3-4 MB/sec is just disturbingly
> > low).  And we are hesitant to disable swap entirely since one of the nodes
> > is running a couple other services
> >
> > Do you have any suggestions on how we may boost the performance?  Thanks!
> 
> Have you tried using more threads on the client side?  Generally
> speaking, when I need faster read/write performance I look for ways to
> parallelize my requests and it scales pretty much linearly.
> 
> 
> --
> Aaron Turner
> http://synfin.net/         Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & 
> Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
>     -- Benjamin Franklin
> "carpe diem quam minimum credula postero"
>

Re: how can we get (a lot) more performance from cassandra

Reply via email to