How big are the batch sizes? In other words, how many rows are you sending
per insert operation?

Other than the above, not much else to suggest without seeing some example
code (on pastebin, gist or similar, ideally).

On Mon, Aug 19, 2013 at 5:49 PM, Keith Freeman <8fo...@gmail.com> wrote:

> I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Ghz
> machines not shared with any other VMs).  I'm inserting time-series data
> into a single column-family using "wide rows" (timeuuids) and have a 3-part
> partition key so my primary key is something like ((a, b, day),
> in-time-uuid), x, y, z).
>
> My java client is feeding rows (about 1k of raw data size each) in batches
> using multiple threads, and the fastest I can get it run reliably is about
> 2000 rows/second.  Even at that speed, all 3 cassandra nodes are very CPU
> bound, with loads of 6-9 each (and the client machine is hardly breaking a
> sweat).  I've tried turning off compression in my table which reduced the
> loads slightly but not much.  There are no other updates or reads
> occurring, except the datastax opscenter.
>
> I was expecting to be able to insert at least 10k rows/second with this
> configuration, and after a lot of reading of docs, blogs, and google, can't
> really figure out what's slowing my client down.  When I increase the
> insert speed of my client beyond 2000/second, the server responses are just
> too slow and the client falls behind.  I had a single-node Mysql database
> that can handle 10k of these data rows/second, so I really feel like I'm
> missing something in Cassandra.  Any ideas?
>
>

Reply via email to