I'm running Cassandra 1.2.6 without compact storage on my tables. The trick is making your Astyanax (I'm running 1.56.42) mutation work with the CQL table definition (this is definitely a bit of a hack since most of the advice says don't mix the CQL and Thrift APIs so it is your call on how far you want to go). If you want to still try and test it out you need to leverage the Astyanax CompositeColumn construct to make it work (https://github.com/Netflix/astyanax/wiki/Composite-columns)
I've provided a slightly modified version of what I am doing below: CQL table def: CREATE TABLE standard_subscription_index ( subscription_type text, subscription_target_id text, entitytype text, entityid int, creationtimestamp timestamp, indexed_tenant_id uuid, deleted boolean, PRIMARY KEY ((subscription_type, subscription_target_id), entitytype, entityid) ) ColumnFamily definition: private static final ColumnFamily<SubscriptionIndexCompositeKey, SubscribingEntityCompositeColumn> COMPOSITE_ROW_COLUMN = new ColumnFamily<SubscriptionIndexCompositeKey, SubscribingEntityCompositeColumn>( SUBSCRIPTION_CF_NAME, new AnnotatedCompositeSerializer<SubscriptionIndexCompositeKey>(SubscriptionIndexCompositeKey.class), new AnnotatedCompositeSerializer<SubscribingEntityCompositeColumn>(SubscribingEntityCompositeColumn.class)); SubscriptionIndexCompositeKey is a class that contains the fields from the row key (e.g., subscription_type, subscription_target_id), and SubscribingEntityCompositeColumn contains the fields from the composite column (as it would look if you view your data using Cassandra-cli), so: entityType, entityId, columnName. The columnName field is the tricky part as it defines what to interpret the column value as (i.e., if it is a value for the creationtimestamp the column might be "someEntityType:4:creationtimestamp" The actual mutation looks something like this: final MutationBatch mutation = getKeyspace().prepareMutationBatch(); final ColumnListMutation<SubscribingEntityCompositeColumn> row = mutation.withRow(COMPOSITE_ROW_COLUMN, new SubscriptionIndexCompositeKey(targetEntityType.getName(), targetEntityId)); for (Subscription sub : subs) { row.putColumn(new SubscribingEntityCompositeColumn(sub.getEntityType().getName(), sub.getEntityId(), "creationtimestamp"), sub.getCreationTimestamp()); row.putColumn(new SubscribingEntityCompositeColumn(sub.getEntityType().getName(), sub.getEntityId(), "deleted"), sub.isDeleted()); row.putColumn(new SubscribingEntityCompositeColumn(sub.getEntityType().getName(), sub.getEntityId(), "indexed_tenant_id"), tenantId); } Hope that helps, Paul From: Keith Freeman [mailto:8fo...@gmail.com] Sent: Thursday, September 12, 2013 12:10 PM To: user@cassandra.apache.org Subject: Re: heavy insert load overloads CPUs, with MutationStage pending Ok, your results are pretty impressive, I'm giving it a try. I've made some initial attempts to use Astyanax 1.56.37, but have some troubles: - it's not compatible with 1.2.8 client-side ( NoSuchMethodError's on org.apache.cassandra.thrift.TBinaryProtocol, which changed it's signature since 1.2.5) - even switching to C* 1.2.5 servers, it's been difficult getting simple examples to work unless I use CF's that have "WITH COMPACT STORAGE" How did you handle these problems? How much effort did it take you to switch from datastax to astyanax? I feel like I'm getting lost in a pretty deep rabbit-hole here. On 09/11/2013 03:03 PM, Paul Cichonski wrote: I was reluctant to use the thrift as well, and I spent about a week trying to get the CQL inserts to work by partitioning the INSERTS in different ways and tuning the cluster. However, nothing worked remotely as well as the batch_mutate when it came to writing a full wide-row at once. I think Cassandra 2.0 makes CQL work better for these cases (CASSANDRA-4693), but I haven't tested it yet. -Paul -----Original Message----- From: Keith Freeman [mailto:8fo...@gmail.com] Sent: Wednesday, September 11, 2013 1:06 PM To: user@cassandra.apache.org Subject: Re: heavy insert load overloads CPUs, with MutationStage pending Thanks, I had seen your stackoverflow post. I've got hundreds of (wide-) rows, and the writes are pretty well distributed across them. I'm very reluctant to drop back to the thrift interface. On 09/11/2013 10:46 AM, Paul Cichonski wrote: How much of the data you are writing is going against the same row key? I've experienced some issues using CQL to write a full wide-row at once (across multiple threads) that exhibited some of the symptoms you have described (i.e., high cpu, dropped mutations). This question goes into it a bit more:http://stackoverflow.com/questions/18522191/using-cassandra-and- cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque . I was able to solve my issue by switching to using the thrift batch_mutate to write a full wide-row at once instead of using many CQL INSERT statements. -Paul -----Original Message----- From: Keith Freeman [mailto:8fo...@gmail.com] Sent: Wednesday, September 11, 2013 9:16 AM To:user@cassandra.apache.org Subject: Re: heavy insert load overloads CPUs, with MutationStage pending On 09/10/2013 11:42 AM, Nate McCall wrote: With SSDs, you can turn up memtable_flush_writers - try 3 initially (1 by default) and see what happens. However, given that there are no entries in 'All time blocked' for such, they may be something else. Tried that, it seems to have reduced the loads a little after everything warmed-up, but not much. How are you inserting the data? A java client on a separate box using the datastax java driver, 48 threads writing 100 records each iteration as prepared batch statements. At 5000 records/sec, the servers just can't keep up, so the client backs up. That's only 5M of data/sec, which doesn't seem like much. As I mentioned, switching to SSDs didn't help much, so I'm assuming at this point that the server overloads are what's holding up the client.