Re: Cassandra read optimization

Dan Feldman Thu, 19 Apr 2012 00:50:06 -0700

Hi Paolo,

Thanks for the hint - JNA indeed wasn't installed. However, now that
cassandra is actually using it, there doesn't seem to be any change in
terms of speed - still 7 seconds with pycassa.


On Thu, Apr 19, 2012 at 12:14 AM, Paolo Bernardi <berna...@gmail.com> wrote:

> Look into your Cassandra's logs to see if JNA is really enabled (it
> really should be, by default), and more importantly if JNA is loaded
> correctly. You might find some surprising message over there: if this
> is the case, just install JNA with your distro's package manager and,
> if still doesn't work, copy the JNA jar into Cassandra's lib directory
> (been there, done that).
>
> Paolo
>
> On Thu, Apr 19, 2012 at 8:26 AM, Dan Feldman <hriunde...@gmail.com> wrote:
> > Hi Tyler and Aaron,
> >
> > Thanks for your replies.
> >
> > Tyler,
> > fetching scs using your pycassa script on our server takes ~7 s -
> consistent
> > with the times we've been seeing. Now, we aren't really experts in
> > Cassandra, but it seems that JNA is enabled by default for Cassandra >
> 1.0
> > according to Jeremy
> > (http://comments.gmane.org/gmane.comp.db.cassandra.user/21441). But in
> case
> > it isn't, how do you turn it on in 1.0.8?
> >
> > I'm also setting MAX_HEAP_SIZE="2G" in cassandra-env.sh. I'm hoping
> that's
> > how you increase java heap size. I've tried "3G" as well, without any
> > increase in performance. It did however allow for taking larger slices.
> >
> > Aaron,
> > we are not doing multi-threaded requests for now, but we'll give it a
> shot
> > in the next day or two and I'll let you know if there is any improvement
> >
> > Thanks for your help!
> > Dan F.
> >
> >
> >
> > On Wed, Apr 18, 2012 at 9:44 PM, Tyler Hobbs <ty...@datastax.com> wrote:
> >>
> >> I tested this out with a small pycassa script:
> >> https://gist.github.com/2418598
> >>
> >> On my not-very-impressive laptop, I can read 5000 of the super columns
> in
> >> 3 seconds (cold) or 1.5 (warm).  Reading in batches of 1000 super
> columns at
> >> a time gives much better performance; I definitely recommend going with
> a
> >> smaller batch size.
> >>
> >> Make sure that the timeout on your ConnectionPool isn't too low to
> handle
> >> a big request in pycassa.  If you turn on logging (as it is in the
> script I
> >> linked), you should be able to see if the request is timing out a
> couple of
> >> times before it succeeds.
> >>
> >> It might also be good to make sure that you've got JNA in place and your
> >> heap size is sufficient.
> >>
> >>
> >> On Wed, Apr 18, 2012 at 8:59 PM, Aaron Turner <synfina...@gmail.com>
> >> wrote:
> >>>
> >>> On Wed, Apr 18, 2012 at 5:00 PM, Dan Feldman <hriunde...@gmail.com>
> >>> wrote:
> >>> > Hi all,
> >>> >
> >>> > I'm trying to optimize moving data from Cassandra to HDFS using
> either
> >>> > Ruby
> >>> > or Python client. Right now, I'm playing around on my staging server,
> >>> > an 8
> >>> > GB single node machine. My data in Cassandra (1.0.8) consist of 2
> rows
> >>> > (for
> >>> > now) with ~150k super columns each (I know, I know - super columns
> are
> >>> > bad).
> >>> > Every super column has ~25 columns totaling ~800 bytes per super
> >>> > column.
> >>> >
> >>> > I should also mention that currently the database is static - there
> are
> >>> > no
> >>> > writes/updates, only reads.
> >>> >
> >>> > Anyways, in my python/ruby scripts, I'm taking slices of 5000
> >>> > supercolumns
> >>> > long from a single row.  It takes 13 seconds with ruby and 8 seconds
> >>> > with
> >>> > pycassa to get a single slice. Or, in other words, it's currently
> >>> > reading at
> >>> > speeds of less than 500 kB per second. The speed seems to be linear
> >>> > with the
> >>> > length of a slice (i.e. 6 seconds for 2500 scs for ruby). If I run
> >>> > nodetool
> >>> > cfstats while my script is running, it tells me that my read latency
> on
> >>> > the
> >>> > column family is ~300ms.
> >>> >
> >>> > I assume that this is not normal and thus was wondering what
> parameters
> >>> > I
> >>> > could tweak to improve the performance.
> >>> >
> >>>
> >>> Is your client mult-threaded?  The single threaded performance of
> >>> Cassandra isn't at all impressive and it really is designed for
> >>> dealing with a lot of simultaneous requests.
> >>>
> >>>
> >>> --
> >>> Aaron Turner
> >>> http://synfin.net/         Twitter: @synfinatic
> >>> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix
> &
> >>> Windows
> >>> Those who would give up essential Liberty, to purchase a little
> temporary
> >>> Safety, deserve neither Liberty nor Safety.
> >>>     -- Benjamin Franklin
> >>> "carpe diem quam minimum credula postero"
> >>
> >>
> >>
> >>
> >> --
> >> Tyler Hobbs
> >> DataStax
> >>
> >
>

Re: Cassandra read optimization

Reply via email to