RE: heavy insert load overloads CPUs, with MutationStage pending

Paul Cichonski Thu, 12 Sep 2013 18:38:02 -0700

I'm running Cassandra 1.2.6 without compact storage on my tables. The trick is 
making your Astyanax (I'm running 1.56.42) mutation work with the CQL table 
definition (this is definitely a bit of a hack since most of the advice says 
don't mix the CQL and Thrift APIs so it is your call on how far you want to 
go). If you want to still try and test it out you need to leverage the Astyanax 
CompositeColumn construct to make it work 
(https://github.com/Netflix/astyanax/wiki/Composite-columns)


I've provided a slightly modified version of what I am doing below:

CQL table def:

CREATE TABLE standard_subscription_index
(
        subscription_type text,
        subscription_target_id text,
        entitytype text,
        entityid int,
        creationtimestamp timestamp,
        indexed_tenant_id uuid,
        deleted boolean,
    PRIMARY KEY ((subscription_type, subscription_target_id), entitytype, 
entityid)
)

ColumnFamily definition:

private static final ColumnFamily<SubscriptionIndexCompositeKey, 
SubscribingEntityCompositeColumn> COMPOSITE_ROW_COLUMN = new 
ColumnFamily<SubscriptionIndexCompositeKey, SubscribingEntityCompositeColumn>(
        SUBSCRIPTION_CF_NAME, new 
AnnotatedCompositeSerializer<SubscriptionIndexCompositeKey>(SubscriptionIndexCompositeKey.class),
        new 
AnnotatedCompositeSerializer<SubscribingEntityCompositeColumn>(SubscribingEntityCompositeColumn.class));


SubscriptionIndexCompositeKey is a class that contains the fields from the row 
key (e.g., subscription_type, subscription_target_id), and 
SubscribingEntityCompositeColumn contains the fields from the composite column 
(as it would look if you view your data using Cassandra-cli), so: entityType, 
entityId, columnName. The columnName field is the tricky part as it defines 
what to interpret the column value as (i.e., if it is a value for the 
creationtimestamp the column might be "someEntityType:4:creationtimestamp"

The actual mutation looks something like this:

final MutationBatch mutation = getKeyspace().prepareMutationBatch();
final ColumnListMutation<SubscribingEntityCompositeColumn> row = 
mutation.withRow(COMPOSITE_ROW_COLUMN,
                new SubscriptionIndexCompositeKey(targetEntityType.getName(), 
targetEntityId));

for (Subscription sub : subs) {
        row.putColumn(new 
SubscribingEntityCompositeColumn(sub.getEntityType().getName(), 
sub.getEntityId(),
                                "creationtimestamp"), 
sub.getCreationTimestamp());
        row.putColumn(new 
SubscribingEntityCompositeColumn(sub.getEntityType().getName(), 
sub.getEntityId(),
                                "deleted"), sub.isDeleted());
        row.putColumn(new 
SubscribingEntityCompositeColumn(sub.getEntityType().getName(), 
sub.getEntityId(),
                                "indexed_tenant_id"), tenantId);
}

Hope that helps,
Paul


From: Keith Freeman [mailto:8fo...@gmail.com] 
Sent: Thursday, September 12, 2013 12:10 PM
To: user@cassandra.apache.org
Subject: Re: heavy insert load overloads CPUs, with MutationStage pending

Ok, your results are pretty impressive, I'm giving it a try.  I've made some 
initial attempts to use Astyanax 1.56.37, but have some troubles:

  - it's not compatible with 1.2.8 client-side ( NoSuchMethodError's on 
org.apache.cassandra.thrift.TBinaryProtocol, which changed it's signature since 
1.2.5)
  - even switching to C* 1.2.5 servers, it's been difficult getting simple 
examples to work unless I use CF's that have "WITH COMPACT STORAGE"

How did you handle these problems?  How much effort did it take you to switch 
from datastax to astyanax?  

I feel like I'm getting lost in a pretty deep rabbit-hole here.
On 09/11/2013 03:03 PM, Paul Cichonski wrote:
I was reluctant to use the thrift as well, and I spent about a week trying to 
get the CQL inserts to work by partitioning the INSERTS in different ways and 
tuning the cluster.

However, nothing worked remotely as well as the batch_mutate when it came to 
writing a full wide-row at once. I think Cassandra 2.0 makes CQL work better 
for these cases (CASSANDRA-4693), but I haven't tested it yet.

-Paul

-----Original Message-----
From: Keith Freeman [mailto:8fo...@gmail.com]
Sent: Wednesday, September 11, 2013 1:06 PM
To: user@cassandra.apache.org
Subject: Re: heavy insert load overloads CPUs, with MutationStage pending

Thanks, I had seen your stackoverflow post.  I've got hundreds of
(wide-) rows, and the writes are pretty well distributed across them.
I'm very reluctant to drop back to the thrift interface.

On 09/11/2013 10:46 AM, Paul Cichonski wrote:
How much of the data you are writing is going against the same row key?

I've experienced some issues using CQL to write a full wide-row at once
(across multiple threads) that exhibited some of the symptoms you have
described (i.e., high cpu, dropped mutations).

This question goes into it a bit
more:http://stackoverflow.com/questions/18522191/using-cassandra-and-
cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque  . I was able to
solve my issue by switching to using the thrift batch_mutate to write a full
wide-row at once instead of using many CQL INSERT statements.

-Paul

-----Original Message-----
From: Keith Freeman [mailto:8fo...@gmail.com]
Sent: Wednesday, September 11, 2013 9:16 AM
To:user@cassandra.apache.org
Subject: Re: heavy insert load overloads CPUs, with MutationStage
pending


On 09/10/2013 11:42 AM, Nate McCall wrote:
With SSDs, you can turn up memtable_flush_writers - try 3 initially
(1 by default) and see what happens. However, given that there are
no entries in 'All time blocked' for such, they may be something else.
Tried that, it seems to have reduced the loads a little after
everything warmed-up, but not much.
How are you inserting the data?
A java client on a separate box using the datastax java driver, 48
threads writing 100 records each iteration as prepared batch statements.

At 5000 records/sec, the servers just can't keep up, so the client backs up.
That's only 5M of data/sec, which doesn't seem like much.  As I
mentioned, switching to SSDs didn't help much, so I'm assuming at
this point that the server overloads are what's holding up the client.

RE: heavy insert load overloads CPUs, with MutationStage pending

Reply via email to