sstable compression

2013-09-12 Thread Christopher Wirt
I current use Snappy for my SSTable compression on Cassandra 1.2.8.

 

I would like to switch to using LZ4 compression for my SStables. Would
simply altering the table definition mean that all newly written tables are
LZ4 and can live in harmony with the existing Snappy SStables?

 

Then naturally over time more of my data will become LZ4 compressed?

Have I missed something?

 

 

Thanks,

Chris

 

 

 

 



Re: VMs versus Physical machines

2013-09-12 Thread Shahab Yunus
I admit about missing details. Sorry for that. The thing is that I was
looking for guidance at the high-level so we can then sort out myself what
fits our requirements and use-cases (mainly because we are at the stage
that they could be molded according to hardware and software
limitations/features.) So, for example if it is recommended that ' for
heavy reads physical is better etc.')

Anyway, just to give you a quick recap:
1- Cassandra 1.2.8
2- Row is a unique userid and can have one or more columns. Every cell is
basically a blob of data (using Avro.) All information is in this one
table. No joins or other access patters.
3- Writes can be both in bulk (which will of course has less strict
performance requirements) or real-time. All writes would be at the per
userid, hence, row level and constitute of adding new rows (of course with
some column values) or updating specific cells (column) of the existing row.
4- Reads are per userid i.e. row and 90% of the time random reads for a
user. Rather than in bulk.
5- Both reads and write interfaces are exposed through REST service as well
as direct Java client API.
6- Reads and writes, as mentioned in 3&4 can be for 1 or more columns at a
time.

Regards,
Shahab




On Thu, Sep 12, 2013 at 1:51 AM, Aaron Turner  wrote:

>
>
>
>
> On Wed, Sep 11, 2013 at 4:40 PM, Shahab Yunus wrote:
>
>> Thanks Aaron for the reply. Yes, VMs or the nodes will be in cloud if we
>> don't go the physical route.
>>
>> " Look how Cassandra scales and provides redundancy.  "
>> But how does it differ for physical machines or VMs (in cloud.) Or after
>> your first comment, are you saying that there is no difference whether we
>> use physical or VMs (in cloud)?
>>
>
> They're different, but both can and do work... VM's just require more
> virtual servers then going the physical route.
>
> Sorry, but without you providing any actual information about your needs
> all you're going to get is generalizations and hand-waving.
>
>
>
>


RE: is the select result grouped by the value of the partition key?

2013-09-12 Thread John Lumby
Aaron,   thanks for the super-rapid response.    That clarifies a lot for me,
but I think I am still wondering about one point embedded below.


> From: aa...@thelastpickle.com 
> Subject: Re: is the select result grouped by the value of the partition key? 
> Date: Thu, 12 Sep 2013 14:19:06 +1200 
> To: user@cassandra.apache.org 
>  
> GROUP BY "feature", 
> I would not think of it like that, this is about physical order of rows. 
>  
> since it seems really important yet does not seem to be mentioned in the 
> CQL reference documentation. 
> It's baked in, this is how the data is organised on the row. 

Yes,   I see,   and I absolutely get the relevance of where columns are stored 
on disk to,
say,  doing INSERTs.
But what I am wondering about is,  in the context of a SELECT,    we seem to be 
relying on
the Cassandra client api preserving that on-disk order while returning rows.
My high-level understanding of how Cassandra handles a SELECT is that :
  (excuse incorrect terminology)
  1.  client connects to some node N
  2.  node N acts as a kind of coordinator and fires off the thrift or 
binary-protocol messages
  to all other nodes to fetch rows off the memtables and/or disks
  3.   coordinator merges,  truncates,  etc the sets from the nodes and returns 
one answer set to client.

It is step 3 which has me wondering  -   does it explicitly preserve the 
on-disk order?
In fact  -  does it simply keep each individual node's answer set separate?   
Is that how it works?

>  
> http://www.datastax.com/dev/blog/thrift-to-cql3 
> We often say the PRIMARY KEY is the PARTITION KEY and the GROUPING COLUMNS 
> http://www.datastax.com/documentation/cql/3.0/webhelp/index.html#cql/cql_reference/create_table_r.html
>  
>  
> See also http://thelastpickle.com/blog/2013/01/11/primary-keys-in-cql.html 
>  
> Is it something we can bet the farm and farmer's family on? 
> Sure. 
>  
> The kinds of scenarios where I am wondering if it's possible for  
> partition-key groups 
> to get intermingled are : 
> All instances of the table entity with the same value(s) for the  
> PARTITION KEY portion of the PRIMARY KEY existing in the same storage  
> engine row. 
>  
>.   what if the node containing primary copy of a row is down 
> There is no primary copy of a row. 
>  
>.   what if there is a heavy stream of UPDATE activity from  
> applications which 
>connect to all nodes,   causing different nodes to have different  
> versions of replicas of same row? 
> That's fine with me. 
> It's only an issue when the data is read, and at that point the  
> Consistency Level determines what we do. 
>  
> Hope that helps. 
>  
>  
> - 
> Aaron Morton 
> New Zealand 
> @aaronmorton 
>  
> Co-Founder & Principal Consultant 
> Apache Cassandra Consulting 
> http://www.thelastpickle.com 
>  
> On 12/09/2013, at 7:43 AM, John Lumby  
> mailto:johnlu...@hotmail.com>> wrote: 
>  
> I would like to make quite sure about this implicit GROUP BY "feature", 
>  
> since it seems really important yet does not seem to be mentioned in the 
> CQL reference documentation. 
>  
>  
>  
> Aaron,   you said "yes"  --   is that "yes,  always,   in all scenarios  
> no matter what" 
>  
> or "yes usually"?  Is it something we can bet the farm and farmer's  
> family on? 
>  
>  
>  
> The kinds of scenarios where I am wondering if it's possible for  
> partition-key groups 
> to get intermingled are : 
>  
>  
>  
>.   what if the node containing primary copy of a row is down 
>  and 
> cassandra fetches this row from a replica on a different node 
> (e.g.  with CONSISTENCY ONE) 
>  
>.   what if there is a heavy stream of UPDATE activity from  
> applications which 
>connect to all nodes,   causing different nodes to have different  
> versions of replicas of same row? 
>  
>  
>  
> Can you point me to some place in the cassandra source code where this  
> grouping is ensured? 
>  
>  
>  
> Many thanks, 
>  
> John Lumby 
> 

Eternal HintedHandoffs

2013-09-12 Thread Francisco Nogueira Calmon Sobral
Hi all!

According to the DataStax blog 
(http://www.datastax.com/dev/blog/modern-hinted-handoff), once the dead node is 
up again, all the nodes that have hints to the node start sending information. 
However, it is written that they verify at every 10 minutes if there were timed 
out hints.

Do they behave like that forever? Do I need to restart the nodes? I'm 
concerned, since the load of the nodes has (small) spikes at each 10 minutes...

I'm using Cassandra 1.2.1.

Regards,
Francisco.




Re: VMs versus Physical machines

2013-09-12 Thread Aaron Turner
On Thu, Sep 12, 2013 at 5:42 AM, Shahab Yunus wrote:

> I admit about missing details. Sorry for that. The thing is that I was
> looking for guidance at the high-level so we can then sort out myself what
> fits our requirements and use-cases (mainly because we are at the stage
> that they could be molded according to hardware and software
> limitations/features.) So, for example if it is recommended that ' for
> heavy reads physical is better etc.')
>
> Anyway, just to give you a quick recap:
> 1- Cassandra 1.2.8
> 2- Row is a unique userid and can have one or more columns. Every cell is
> basically a blob of data (using Avro.) All information is in this one
> table. No joins or other access patters.
> 3- Writes can be both in bulk (which will of course has less strict
> performance requirements) or real-time. All writes would be at the per
> userid, hence, row level and constitute of adding new rows (of course with
> some column values) or updating specific cells (column) of the existing row.
> 4- Reads are per userid i.e. row and 90% of the time random reads for a
> user. Rather than in bulk.
> 5- Both reads and write interfaces are exposed through REST service as
> well as direct Java client API.
> 6- Reads and writes, as mentioned in 3&4 can be for 1 or more columns at a
> time.
>
> Regards,
> Shahab
>


Your total data set size and number of reads/writes per-second are the
important things here.  Also how sensitive are you to latency spikes (which
tends to happen with VM's)?

Long story short, the safest option is always physical IMHO.  Use VM/cloud
if you need to use VM/cloud for some reason (like all the other servers
talking to Cassandra are also in AWS for example).  Cloud can work (Netflix
uses Cassandra on AWS), but your performance will be a lot more consistent
on physical hardware and Cassandra like all databases likes lots of RAM
(although this can be offset some with SSD's) which tends to be expensive
in the cloud.




-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
https://github.com/synfinatic/tcpreplay - Pcap editing and replay tools for
Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin


Re: sstable compression

2013-09-12 Thread Robert Coli
On Thu, Sep 12, 2013 at 2:13 AM, Christopher Wirt wrote:

> I would like to switch to using LZ4 compression for my SStables. Would
> simply altering the table definition mean that all newly written tables are
> LZ4 and can live in harmony with the existing Snappy SStables?
>

Yes, per Aleksey in #cassandra @ freenode, the compressor is stored in
SSTable meta-information. This means that the compressor in the config is
*only* the compressor for new tables.

=Rob


RE: heavy insert load overloads CPUs, with MutationStage pending

2013-09-12 Thread Paul Cichonski
I'm running Cassandra 1.2.6 without compact storage on my tables. The trick is 
making your Astyanax (I'm running 1.56.42) mutation work with the CQL table 
definition (this is definitely a bit of a hack since most of the advice says 
don't mix the CQL and Thrift APIs so it is your call on how far you want to 
go). If you want to still try and test it out you need to leverage the Astyanax 
CompositeColumn construct to make it work 
(https://github.com/Netflix/astyanax/wiki/Composite-columns)

I've provided a slightly modified version of what I am doing below:

CQL table def:

CREATE TABLE standard_subscription_index
(
subscription_type text,
subscription_target_id text,
entitytype text,
entityid int,
creationtimestamp timestamp,
indexed_tenant_id uuid,
deleted boolean,
PRIMARY KEY ((subscription_type, subscription_target_id), entitytype, 
entityid)
)

ColumnFamily definition:

private static final ColumnFamily COMPOSITE_ROW_COLUMN = new 
ColumnFamily(
SUBSCRIPTION_CF_NAME, new 
AnnotatedCompositeSerializer(SubscriptionIndexCompositeKey.class),
new 
AnnotatedCompositeSerializer(SubscribingEntityCompositeColumn.class));


SubscriptionIndexCompositeKey is a class that contains the fields from the row 
key (e.g., subscription_type, subscription_target_id), and 
SubscribingEntityCompositeColumn contains the fields from the composite column 
(as it would look if you view your data using Cassandra-cli), so: entityType, 
entityId, columnName. The columnName field is the tricky part as it defines 
what to interpret the column value as (i.e., if it is a value for the 
creationtimestamp the column might be "someEntityType:4:creationtimestamp"

The actual mutation looks something like this:

final MutationBatch mutation = getKeyspace().prepareMutationBatch();
final ColumnListMutation row = 
mutation.withRow(COMPOSITE_ROW_COLUMN,
new SubscriptionIndexCompositeKey(targetEntityType.getName(), 
targetEntityId));

for (Subscription sub : subs) {
row.putColumn(new 
SubscribingEntityCompositeColumn(sub.getEntityType().getName(), 
sub.getEntityId(),
"creationtimestamp"), 
sub.getCreationTimestamp());
row.putColumn(new 
SubscribingEntityCompositeColumn(sub.getEntityType().getName(), 
sub.getEntityId(),
"deleted"), sub.isDeleted());
row.putColumn(new 
SubscribingEntityCompositeColumn(sub.getEntityType().getName(), 
sub.getEntityId(),
"indexed_tenant_id"), tenantId);
}

Hope that helps,
Paul


From: Keith Freeman [mailto:8fo...@gmail.com] 
Sent: Thursday, September 12, 2013 12:10 PM
To: user@cassandra.apache.org
Subject: Re: heavy insert load overloads CPUs, with MutationStage pending

Ok, your results are pretty impressive, I'm giving it a try.  I've made some 
initial attempts to use Astyanax 1.56.37, but have some troubles:

  - it's not compatible with 1.2.8 client-side ( NoSuchMethodError's on 
org.apache.cassandra.thrift.TBinaryProtocol, which changed it's signature since 
1.2.5)
  - even switching to C* 1.2.5 servers, it's been difficult getting simple 
examples to work unless I use CF's that have "WITH COMPACT STORAGE"

How did you handle these problems?  How much effort did it take you to switch 
from datastax to astyanax?  

I feel like I'm getting lost in a pretty deep rabbit-hole here.
On 09/11/2013 03:03 PM, Paul Cichonski wrote:
I was reluctant to use the thrift as well, and I spent about a week trying to 
get the CQL inserts to work by partitioning the INSERTS in different ways and 
tuning the cluster.

However, nothing worked remotely as well as the batch_mutate when it came to 
writing a full wide-row at once. I think Cassandra 2.0 makes CQL work better 
for these cases (CASSANDRA-4693), but I haven't tested it yet.

-Paul

-Original Message-
From: Keith Freeman [mailto:8fo...@gmail.com]
Sent: Wednesday, September 11, 2013 1:06 PM
To: user@cassandra.apache.org
Subject: Re: heavy insert load overloads CPUs, with MutationStage pending

Thanks, I had seen your stackoverflow post.  I've got hundreds of
(wide-) rows, and the writes are pretty well distributed across them.
I'm very reluctant to drop back to the thrift interface.

On 09/11/2013 10:46 AM, Paul Cichonski wrote:
How much of the data you are writing is going against the same row key?

I've experienced some issues using CQL to write a full wide-row at once
(across multiple threads) that exhibited some of the symptoms you have
described (i.e., high cpu, dropped mutations).

This question goes into it a bit
more:http://stackoverflow.com/questions/18522191/using-cassandra-and-
cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque  . I was able to
solve my issue by switching to using the thrift batch_mutate to write a full
wide-row at once instead of using many CQL INSERT statements.

-Paul

-Original Message-
From: Keith Freeman [

Re: heavy insert load overloads CPUs, with MutationStage pending

2013-09-12 Thread Keith Freeman
Ok, your results are pretty impressive, I'm giving it a try.  I've made 
some initial attempts to use Astyanax 1.56.37, but have some troubles:


  - it's not compatible with 1.2.8 client-side ( NoSuchMethodError's on 
org.apache.cassandra.thrift.TBinaryProtocol, which changed it's 
signature since 1.2.5)
  - even switching to C* 1.2.5 servers, it's been difficult getting 
simple examples to work unless I use CF's that have "WITH COMPACT STORAGE"


How did you handle these problems?  How much effort did it take you to 
switch from datastax to astyanax?


I feel like I'm getting lost in a pretty deep rabbit-hole here.

On 09/11/2013 03:03 PM, Paul Cichonski wrote:

I was reluctant to use the thrift as well, and I spent about a week trying to 
get the CQL inserts to work by partitioning the INSERTS in different ways and 
tuning the cluster.

However, nothing worked remotely as well as the batch_mutate when it came to 
writing a full wide-row at once. I think Cassandra 2.0 makes CQL work better 
for these cases (CASSANDRA-4693), but I haven't tested it yet.

-Paul


-Original Message-
From: Keith Freeman [mailto:8fo...@gmail.com]
Sent: Wednesday, September 11, 2013 1:06 PM
To: user@cassandra.apache.org
Subject: Re: heavy insert load overloads CPUs, with MutationStage pending

Thanks, I had seen your stackoverflow post.  I've got hundreds of
(wide-) rows, and the writes are pretty well distributed across them.
I'm very reluctant to drop back to the thrift interface.

On 09/11/2013 10:46 AM, Paul Cichonski wrote:

How much of the data you are writing is going against the same row key?

I've experienced some issues using CQL to write a full wide-row at once

(across multiple threads) that exhibited some of the symptoms you have
described (i.e., high cpu, dropped mutations).

This question goes into it a bit

more:http://stackoverflow.com/questions/18522191/using-cassandra-and-
cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque  . I was able to
solve my issue by switching to using the thrift batch_mutate to write a full
wide-row at once instead of using many CQL INSERT statements.

-Paul


-Original Message-
From: Keith Freeman [mailto:8fo...@gmail.com]
Sent: Wednesday, September 11, 2013 9:16 AM
To:user@cassandra.apache.org
Subject: Re: heavy insert load overloads CPUs, with MutationStage
pending


On 09/10/2013 11:42 AM, Nate McCall wrote:

With SSDs, you can turn up memtable_flush_writers - try 3 initially
(1 by default) and see what happens. However, given that there are
no entries in 'All time blocked' for such, they may be something else.

Tried that, it seems to have reduced the loads a little after
everything warmed-up, but not much.

How are you inserting the data?

A java client on a separate box using the datastax java driver, 48
threads writing 100 records each iteration as prepared batch statements.

At 5000 records/sec, the servers just can't keep up, so the client backs up.
That's only 5M of data/sec, which doesn't seem like much.  As I
mentioned, switching to SSDs didn't help much, so I'm assuming at
this point that the server overloads are what's holding up the client.