tl;dr: It seems the datastax client, though otherwise well written and performant, is, in it's current form for 1.2.x and below, a non-starter for folks requiring high performance inserts.
Corroborating other's findings on this thread, and several posts the post couple of weeks, I just ran a series of tests for a client and Astyanax outperformed the DS driver by a factor of 3 to 1 in a single-threaded (for simplicity sake and to reduce potential variables) load of time series data. Paul's example is pretty much the same approach I took to use the existing CQL3 table definition from Thrift. Brian O'Neil has a pair of good blog posts on this topic for more detail: http://brianoneill.blogspot.com/2012/09/composite-keys-connecting-dots-between.html http://brianoneill.blogspot.com/2012/10/cql-astyanax-and-compoundcomposite-keys.html Per Keith's findings with compatibility, see: https://github.com/Netflix/astyanax/issues/391 On Thu, Sep 12, 2013 at 3:26 PM, Paul Cichonski <paul.cichon...@lithium.com>wrote: > I'm running Cassandra 1.2.6 without compact storage on my tables. The > trick is making your Astyanax (I'm running 1.56.42) mutation work with the > CQL table definition (this is definitely a bit of a hack since most of the > advice says don't mix the CQL and Thrift APIs so it is your call on how far > you want to go). If you want to still try and test it out you need to > leverage the Astyanax CompositeColumn construct to make it work ( > https://github.com/Netflix/astyanax/wiki/Composite-columns) > > I've provided a slightly modified version of what I am doing below: > > CQL table def: > > CREATE TABLE standard_subscription_index > ( > subscription_type text, > subscription_target_id text, > entitytype text, > entityid int, > creationtimestamp timestamp, > indexed_tenant_id uuid, > deleted boolean, > PRIMARY KEY ((subscription_type, subscription_target_id), entitytype, > entityid) > ) > > ColumnFamily definition: > > private static final ColumnFamily<SubscriptionIndexCompositeKey, > SubscribingEntityCompositeColumn> COMPOSITE_ROW_COLUMN = new > ColumnFamily<SubscriptionIndexCompositeKey, > SubscribingEntityCompositeColumn>( > SUBSCRIPTION_CF_NAME, new > AnnotatedCompositeSerializer<SubscriptionIndexCompositeKey>(SubscriptionIndexCompositeKey.class), > new > AnnotatedCompositeSerializer<SubscribingEntityCompositeColumn>(SubscribingEntityCompositeColumn.class)); > > > SubscriptionIndexCompositeKey is a class that contains the fields from the > row key (e.g., subscription_type, subscription_target_id), and > SubscribingEntityCompositeColumn contains the fields from the composite > column (as it would look if you view your data using Cassandra-cli), so: > entityType, entityId, columnName. The columnName field is the tricky part > as it defines what to interpret the column value as (i.e., if it is a value > for the creationtimestamp the column might be > "someEntityType:4:creationtimestamp" > > The actual mutation looks something like this: > > final MutationBatch mutation = getKeyspace().prepareMutationBatch(); > final ColumnListMutation<SubscribingEntityCompositeColumn> row = > mutation.withRow(COMPOSITE_ROW_COLUMN, > new > SubscriptionIndexCompositeKey(targetEntityType.getName(), targetEntityId)); > > for (Subscription sub : subs) { > row.putColumn(new > SubscribingEntityCompositeColumn(sub.getEntityType().getName(), > sub.getEntityId(), > "creationtimestamp"), > sub.getCreationTimestamp()); > row.putColumn(new > SubscribingEntityCompositeColumn(sub.getEntityType().getName(), > sub.getEntityId(), > "deleted"), sub.isDeleted()); > row.putColumn(new > SubscribingEntityCompositeColumn(sub.getEntityType().getName(), > sub.getEntityId(), > "indexed_tenant_id"), tenantId); > } > > Hope that helps, > Paul > > > From: Keith Freeman [mailto:8fo...@gmail.com] > Sent: Thursday, September 12, 2013 12:10 PM > To: user@cassandra.apache.org > Subject: Re: heavy insert load overloads CPUs, with MutationStage pending > > Ok, your results are pretty impressive, I'm giving it a try. I've made > some initial attempts to use Astyanax 1.56.37, but have some troubles: > > - it's not compatible with 1.2.8 client-side ( NoSuchMethodError's on > org.apache.cassandra.thrift.TBinaryProtocol, which changed it's signature > since 1.2.5) > - even switching to C* 1.2.5 servers, it's been difficult getting simple > examples to work unless I use CF's that have "WITH COMPACT STORAGE" > > How did you handle these problems? How much effort did it take you to > switch from datastax to astyanax? > > I feel like I'm getting lost in a pretty deep rabbit-hole here. > On 09/11/2013 03:03 PM, Paul Cichonski wrote: > I was reluctant to use the thrift as well, and I spent about a week trying > to get the CQL inserts to work by partitioning the INSERTS in different > ways and tuning the cluster. > > However, nothing worked remotely as well as the batch_mutate when it came > to writing a full wide-row at once. I think Cassandra 2.0 makes CQL work > better for these cases (CASSANDRA-4693), but I haven't tested it yet. > > -Paul > > -----Original Message----- > From: Keith Freeman [mailto:8fo...@gmail.com] > Sent: Wednesday, September 11, 2013 1:06 PM > To: user@cassandra.apache.org > Subject: Re: heavy insert load overloads CPUs, with MutationStage pending > > Thanks, I had seen your stackoverflow post. I've got hundreds of > (wide-) rows, and the writes are pretty well distributed across them. > I'm very reluctant to drop back to the thrift interface. > > On 09/11/2013 10:46 AM, Paul Cichonski wrote: > How much of the data you are writing is going against the same row key? > > I've experienced some issues using CQL to write a full wide-row at once > (across multiple threads) that exhibited some of the symptoms you have > described (i.e., high cpu, dropped mutations). > > This question goes into it a bit > more:http://stackoverflow.com/questions/18522191/using-cassandra-and- > cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque . I was able > to > solve my issue by switching to using the thrift batch_mutate to write a > full > wide-row at once instead of using many CQL INSERT statements. > > -Paul > > -----Original Message----- > From: Keith Freeman [mailto:8fo...@gmail.com] > Sent: Wednesday, September 11, 2013 9:16 AM > To:user@cassandra.apache.org > Subject: Re: heavy insert load overloads CPUs, with MutationStage > pending > > > On 09/10/2013 11:42 AM, Nate McCall wrote: > With SSDs, you can turn up memtable_flush_writers - try 3 initially > (1 by default) and see what happens. However, given that there are > no entries in 'All time blocked' for such, they may be something else. > Tried that, it seems to have reduced the loads a little after > everything warmed-up, but not much. > How are you inserting the data? > A java client on a separate box using the datastax java driver, 48 > threads writing 100 records each iteration as prepared batch statements. > > At 5000 records/sec, the servers just can't keep up, so the client backs > up. > That's only 5M of data/sec, which doesn't seem like much. As I > mentioned, switching to SSDs didn't help much, so I'm assuming at > this point that the server overloads are what's holding up the client. > > >