Re: heavy insert load overloads CPUs, with MutationStage pending

Nate McCall Fri, 13 Sep 2013 13:04:19 -0700

tl;dr: It seems the datastax client, though otherwise well written and
performant, is, in it's current form for 1.2.x and below, a non-starter for
folks requiring high performance inserts.


Corroborating other's findings on this thread, and several posts the post
couple of weeks, I just ran a series of tests for a client and Astyanax
outperformed the DS driver by a factor of 3 to 1 in a single-threaded (for
simplicity sake and to reduce potential variables) load of time series data.

Paul's example is pretty much the same approach I took to use the existing
CQL3 table definition from Thrift. Brian O'Neil has a pair of good blog
posts on this topic for more detail:
http://brianoneill.blogspot.com/2012/09/composite-keys-connecting-dots-between.html
http://brianoneill.blogspot.com/2012/10/cql-astyanax-and-compoundcomposite-keys.html

Per Keith's findings with compatibility, see:
https://github.com/Netflix/astyanax/issues/391


On Thu, Sep 12, 2013 at 3:26 PM, Paul Cichonski
<paul.cichon...@lithium.com>wrote:

> I'm running Cassandra 1.2.6 without compact storage on my tables. The
> trick is making your Astyanax (I'm running 1.56.42) mutation work with the
> CQL table definition (this is definitely a bit of a hack since most of the
> advice says don't mix the CQL and Thrift APIs so it is your call on how far
> you want to go). If you want to still try and test it out you need to
> leverage the Astyanax CompositeColumn construct to make it work (
> https://github.com/Netflix/astyanax/wiki/Composite-columns)
>
> I've provided a slightly modified version of what I am doing below:
>
> CQL table def:
>
> CREATE TABLE standard_subscription_index
> (
>         subscription_type text,
>         subscription_target_id text,
>         entitytype text,
>         entityid int,
>         creationtimestamp timestamp,
>         indexed_tenant_id uuid,
>         deleted boolean,
>     PRIMARY KEY ((subscription_type, subscription_target_id), entitytype,
> entityid)
> )
>
> ColumnFamily definition:
>
> private static final ColumnFamily<SubscriptionIndexCompositeKey,
> SubscribingEntityCompositeColumn> COMPOSITE_ROW_COLUMN = new
> ColumnFamily<SubscriptionIndexCompositeKey,
> SubscribingEntityCompositeColumn>(
>         SUBSCRIPTION_CF_NAME, new
> AnnotatedCompositeSerializer<SubscriptionIndexCompositeKey>(SubscriptionIndexCompositeKey.class),
>         new
> AnnotatedCompositeSerializer<SubscribingEntityCompositeColumn>(SubscribingEntityCompositeColumn.class));
>
>
> SubscriptionIndexCompositeKey is a class that contains the fields from the
> row key (e.g., subscription_type, subscription_target_id), and
> SubscribingEntityCompositeColumn contains the fields from the composite
> column (as it would look if you view your data using Cassandra-cli), so:
> entityType, entityId, columnName. The columnName field is the tricky part
> as it defines what to interpret the column value as (i.e., if it is a value
> for the creationtimestamp the column might be
> "someEntityType:4:creationtimestamp"
>
> The actual mutation looks something like this:
>
> final MutationBatch mutation = getKeyspace().prepareMutationBatch();
> final ColumnListMutation<SubscribingEntityCompositeColumn> row =
> mutation.withRow(COMPOSITE_ROW_COLUMN,
>                 new
> SubscriptionIndexCompositeKey(targetEntityType.getName(), targetEntityId));
>
> for (Subscription sub : subs) {
>         row.putColumn(new
> SubscribingEntityCompositeColumn(sub.getEntityType().getName(),
> sub.getEntityId(),
>                                 "creationtimestamp"),
> sub.getCreationTimestamp());
>         row.putColumn(new
> SubscribingEntityCompositeColumn(sub.getEntityType().getName(),
> sub.getEntityId(),
>                                 "deleted"), sub.isDeleted());
>         row.putColumn(new
> SubscribingEntityCompositeColumn(sub.getEntityType().getName(),
> sub.getEntityId(),
>                                 "indexed_tenant_id"), tenantId);
> }
>
> Hope that helps,
> Paul
>
>
> From: Keith Freeman [mailto:8fo...@gmail.com]
> Sent: Thursday, September 12, 2013 12:10 PM
> To: user@cassandra.apache.org
> Subject: Re: heavy insert load overloads CPUs, with MutationStage pending
>
> Ok, your results are pretty impressive, I'm giving it a try.  I've made
> some initial attempts to use Astyanax 1.56.37, but have some troubles:
>
>   - it's not compatible with 1.2.8 client-side ( NoSuchMethodError's on
> org.apache.cassandra.thrift.TBinaryProtocol, which changed it's signature
> since 1.2.5)
>   - even switching to C* 1.2.5 servers, it's been difficult getting simple
> examples to work unless I use CF's that have "WITH COMPACT STORAGE"
>
> How did you handle these problems?  How much effort did it take you to
> switch from datastax to astyanax?
>
> I feel like I'm getting lost in a pretty deep rabbit-hole here.
> On 09/11/2013 03:03 PM, Paul Cichonski wrote:
> I was reluctant to use the thrift as well, and I spent about a week trying
> to get the CQL inserts to work by partitioning the INSERTS in different
> ways and tuning the cluster.
>
> However, nothing worked remotely as well as the batch_mutate when it came
> to writing a full wide-row at once. I think Cassandra 2.0 makes CQL work
> better for these cases (CASSANDRA-4693), but I haven't tested it yet.
>
> -Paul
>
> -----Original Message-----
> From: Keith Freeman [mailto:8fo...@gmail.com]
> Sent: Wednesday, September 11, 2013 1:06 PM
> To: user@cassandra.apache.org
> Subject: Re: heavy insert load overloads CPUs, with MutationStage pending
>
> Thanks, I had seen your stackoverflow post.  I've got hundreds of
> (wide-) rows, and the writes are pretty well distributed across them.
> I'm very reluctant to drop back to the thrift interface.
>
> On 09/11/2013 10:46 AM, Paul Cichonski wrote:
> How much of the data you are writing is going against the same row key?
>
> I've experienced some issues using CQL to write a full wide-row at once
> (across multiple threads) that exhibited some of the symptoms you have
> described (i.e., high cpu, dropped mutations).
>
> This question goes into it a bit
> more:http://stackoverflow.com/questions/18522191/using-cassandra-and-
> cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque  . I was able
> to
> solve my issue by switching to using the thrift batch_mutate to write a
> full
> wide-row at once instead of using many CQL INSERT statements.
>
> -Paul
>
> -----Original Message-----
> From: Keith Freeman [mailto:8fo...@gmail.com]
> Sent: Wednesday, September 11, 2013 9:16 AM
> To:user@cassandra.apache.org
> Subject: Re: heavy insert load overloads CPUs, with MutationStage
> pending
>
>
> On 09/10/2013 11:42 AM, Nate McCall wrote:
> With SSDs, you can turn up memtable_flush_writers - try 3 initially
> (1 by default) and see what happens. However, given that there are
> no entries in 'All time blocked' for such, they may be something else.
> Tried that, it seems to have reduced the loads a little after
> everything warmed-up, but not much.
> How are you inserting the data?
> A java client on a separate box using the datastax java driver, 48
> threads writing 100 records each iteration as prepared batch statements.
>
> At 5000 records/sec, the servers just can't keep up, so the client backs
> up.
> That's only 5M of data/sec, which doesn't seem like much.  As I
> mentioned, switching to SSDs didn't help much, so I'm assuming at
> this point that the server overloads are what's holding up the client.
>
>
>

Re: heavy insert load overloads CPUs, with MutationStage pending

Reply via email to