sstable compression
I current use Snappy for my SSTable compression on Cassandra 1.2.8. I would like to switch to using LZ4 compression for my SStables. Would simply altering the table definition mean that all newly written tables are LZ4 and can live in harmony with the existing Snappy SStables? Then naturally over time more of my data will become LZ4 compressed? Have I missed something? Thanks, Chris
Re: VMs versus Physical machines
I admit about missing details. Sorry for that. The thing is that I was looking for guidance at the high-level so we can then sort out myself what fits our requirements and use-cases (mainly because we are at the stage that they could be molded according to hardware and software limitations/features.) So, for example if it is recommended that ' for heavy reads physical is better etc.') Anyway, just to give you a quick recap: 1- Cassandra 1.2.8 2- Row is a unique userid and can have one or more columns. Every cell is basically a blob of data (using Avro.) All information is in this one table. No joins or other access patters. 3- Writes can be both in bulk (which will of course has less strict performance requirements) or real-time. All writes would be at the per userid, hence, row level and constitute of adding new rows (of course with some column values) or updating specific cells (column) of the existing row. 4- Reads are per userid i.e. row and 90% of the time random reads for a user. Rather than in bulk. 5- Both reads and write interfaces are exposed through REST service as well as direct Java client API. 6- Reads and writes, as mentioned in 3&4 can be for 1 or more columns at a time. Regards, Shahab On Thu, Sep 12, 2013 at 1:51 AM, Aaron Turner wrote: > > > > > On Wed, Sep 11, 2013 at 4:40 PM, Shahab Yunus wrote: > >> Thanks Aaron for the reply. Yes, VMs or the nodes will be in cloud if we >> don't go the physical route. >> >> " Look how Cassandra scales and provides redundancy. " >> But how does it differ for physical machines or VMs (in cloud.) Or after >> your first comment, are you saying that there is no difference whether we >> use physical or VMs (in cloud)? >> > > They're different, but both can and do work... VM's just require more > virtual servers then going the physical route. > > Sorry, but without you providing any actual information about your needs > all you're going to get is generalizations and hand-waving. > > > >
RE: is the select result grouped by the value of the partition key?
Aaron, thanks for the super-rapid response. That clarifies a lot for me, but I think I am still wondering about one point embedded below. > From: aa...@thelastpickle.com > Subject: Re: is the select result grouped by the value of the partition key? > Date: Thu, 12 Sep 2013 14:19:06 +1200 > To: user@cassandra.apache.org > > GROUP BY "feature", > I would not think of it like that, this is about physical order of rows. > > since it seems really important yet does not seem to be mentioned in the > CQL reference documentation. > It's baked in, this is how the data is organised on the row. Yes, I see, and I absolutely get the relevance of where columns are stored on disk to, say, doing INSERTs. But what I am wondering about is, in the context of a SELECT, we seem to be relying on the Cassandra client api preserving that on-disk order while returning rows. My high-level understanding of how Cassandra handles a SELECT is that : (excuse incorrect terminology) 1. client connects to some node N 2. node N acts as a kind of coordinator and fires off the thrift or binary-protocol messages to all other nodes to fetch rows off the memtables and/or disks 3. coordinator merges, truncates, etc the sets from the nodes and returns one answer set to client. It is step 3 which has me wondering - does it explicitly preserve the on-disk order? In fact - does it simply keep each individual node's answer set separate? Is that how it works? > > http://www.datastax.com/dev/blog/thrift-to-cql3 > We often say the PRIMARY KEY is the PARTITION KEY and the GROUPING COLUMNS > http://www.datastax.com/documentation/cql/3.0/webhelp/index.html#cql/cql_reference/create_table_r.html > > > See also http://thelastpickle.com/blog/2013/01/11/primary-keys-in-cql.html > > Is it something we can bet the farm and farmer's family on? > Sure. > > The kinds of scenarios where I am wondering if it's possible for > partition-key groups > to get intermingled are : > All instances of the table entity with the same value(s) for the > PARTITION KEY portion of the PRIMARY KEY existing in the same storage > engine row. > >. what if the node containing primary copy of a row is down > There is no primary copy of a row. > >. what if there is a heavy stream of UPDATE activity from > applications which >connect to all nodes, causing different nodes to have different > versions of replicas of same row? > That's fine with me. > It's only an issue when the data is read, and at that point the > Consistency Level determines what we do. > > Hope that helps. > > > - > Aaron Morton > New Zealand > @aaronmorton > > Co-Founder & Principal Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > > On 12/09/2013, at 7:43 AM, John Lumby > mailto:johnlu...@hotmail.com>> wrote: > > I would like to make quite sure about this implicit GROUP BY "feature", > > since it seems really important yet does not seem to be mentioned in the > CQL reference documentation. > > > > Aaron, you said "yes" -- is that "yes, always, in all scenarios > no matter what" > > or "yes usually"? Is it something we can bet the farm and farmer's > family on? > > > > The kinds of scenarios where I am wondering if it's possible for > partition-key groups > to get intermingled are : > > > >. what if the node containing primary copy of a row is down > and > cassandra fetches this row from a replica on a different node > (e.g. with CONSISTENCY ONE) > >. what if there is a heavy stream of UPDATE activity from > applications which >connect to all nodes, causing different nodes to have different > versions of replicas of same row? > > > > Can you point me to some place in the cassandra source code where this > grouping is ensured? > > > > Many thanks, > > John Lumby >
Eternal HintedHandoffs
Hi all! According to the DataStax blog (http://www.datastax.com/dev/blog/modern-hinted-handoff), once the dead node is up again, all the nodes that have hints to the node start sending information. However, it is written that they verify at every 10 minutes if there were timed out hints. Do they behave like that forever? Do I need to restart the nodes? I'm concerned, since the load of the nodes has (small) spikes at each 10 minutes... I'm using Cassandra 1.2.1. Regards, Francisco.
Re: VMs versus Physical machines
On Thu, Sep 12, 2013 at 5:42 AM, Shahab Yunus wrote: > I admit about missing details. Sorry for that. The thing is that I was > looking for guidance at the high-level so we can then sort out myself what > fits our requirements and use-cases (mainly because we are at the stage > that they could be molded according to hardware and software > limitations/features.) So, for example if it is recommended that ' for > heavy reads physical is better etc.') > > Anyway, just to give you a quick recap: > 1- Cassandra 1.2.8 > 2- Row is a unique userid and can have one or more columns. Every cell is > basically a blob of data (using Avro.) All information is in this one > table. No joins or other access patters. > 3- Writes can be both in bulk (which will of course has less strict > performance requirements) or real-time. All writes would be at the per > userid, hence, row level and constitute of adding new rows (of course with > some column values) or updating specific cells (column) of the existing row. > 4- Reads are per userid i.e. row and 90% of the time random reads for a > user. Rather than in bulk. > 5- Both reads and write interfaces are exposed through REST service as > well as direct Java client API. > 6- Reads and writes, as mentioned in 3&4 can be for 1 or more columns at a > time. > > Regards, > Shahab > Your total data set size and number of reads/writes per-second are the important things here. Also how sensitive are you to latency spikes (which tends to happen with VM's)? Long story short, the safest option is always physical IMHO. Use VM/cloud if you need to use VM/cloud for some reason (like all the other servers talking to Cassandra are also in AWS for example). Cloud can work (Netflix uses Cassandra on AWS), but your performance will be a lot more consistent on physical hardware and Cassandra like all databases likes lots of RAM (although this can be offset some with SSD's) which tends to be expensive in the cloud. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic https://github.com/synfinatic/tcpreplay - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin
Re: sstable compression
On Thu, Sep 12, 2013 at 2:13 AM, Christopher Wirt wrote: > I would like to switch to using LZ4 compression for my SStables. Would > simply altering the table definition mean that all newly written tables are > LZ4 and can live in harmony with the existing Snappy SStables? > Yes, per Aleksey in #cassandra @ freenode, the compressor is stored in SSTable meta-information. This means that the compressor in the config is *only* the compressor for new tables. =Rob
RE: heavy insert load overloads CPUs, with MutationStage pending
I'm running Cassandra 1.2.6 without compact storage on my tables. The trick is making your Astyanax (I'm running 1.56.42) mutation work with the CQL table definition (this is definitely a bit of a hack since most of the advice says don't mix the CQL and Thrift APIs so it is your call on how far you want to go). If you want to still try and test it out you need to leverage the Astyanax CompositeColumn construct to make it work (https://github.com/Netflix/astyanax/wiki/Composite-columns) I've provided a slightly modified version of what I am doing below: CQL table def: CREATE TABLE standard_subscription_index ( subscription_type text, subscription_target_id text, entitytype text, entityid int, creationtimestamp timestamp, indexed_tenant_id uuid, deleted boolean, PRIMARY KEY ((subscription_type, subscription_target_id), entitytype, entityid) ) ColumnFamily definition: private static final ColumnFamily COMPOSITE_ROW_COLUMN = new ColumnFamily( SUBSCRIPTION_CF_NAME, new AnnotatedCompositeSerializer(SubscriptionIndexCompositeKey.class), new AnnotatedCompositeSerializer(SubscribingEntityCompositeColumn.class)); SubscriptionIndexCompositeKey is a class that contains the fields from the row key (e.g., subscription_type, subscription_target_id), and SubscribingEntityCompositeColumn contains the fields from the composite column (as it would look if you view your data using Cassandra-cli), so: entityType, entityId, columnName. The columnName field is the tricky part as it defines what to interpret the column value as (i.e., if it is a value for the creationtimestamp the column might be "someEntityType:4:creationtimestamp" The actual mutation looks something like this: final MutationBatch mutation = getKeyspace().prepareMutationBatch(); final ColumnListMutation row = mutation.withRow(COMPOSITE_ROW_COLUMN, new SubscriptionIndexCompositeKey(targetEntityType.getName(), targetEntityId)); for (Subscription sub : subs) { row.putColumn(new SubscribingEntityCompositeColumn(sub.getEntityType().getName(), sub.getEntityId(), "creationtimestamp"), sub.getCreationTimestamp()); row.putColumn(new SubscribingEntityCompositeColumn(sub.getEntityType().getName(), sub.getEntityId(), "deleted"), sub.isDeleted()); row.putColumn(new SubscribingEntityCompositeColumn(sub.getEntityType().getName(), sub.getEntityId(), "indexed_tenant_id"), tenantId); } Hope that helps, Paul From: Keith Freeman [mailto:8fo...@gmail.com] Sent: Thursday, September 12, 2013 12:10 PM To: user@cassandra.apache.org Subject: Re: heavy insert load overloads CPUs, with MutationStage pending Ok, your results are pretty impressive, I'm giving it a try. I've made some initial attempts to use Astyanax 1.56.37, but have some troubles: - it's not compatible with 1.2.8 client-side ( NoSuchMethodError's on org.apache.cassandra.thrift.TBinaryProtocol, which changed it's signature since 1.2.5) - even switching to C* 1.2.5 servers, it's been difficult getting simple examples to work unless I use CF's that have "WITH COMPACT STORAGE" How did you handle these problems? How much effort did it take you to switch from datastax to astyanax? I feel like I'm getting lost in a pretty deep rabbit-hole here. On 09/11/2013 03:03 PM, Paul Cichonski wrote: I was reluctant to use the thrift as well, and I spent about a week trying to get the CQL inserts to work by partitioning the INSERTS in different ways and tuning the cluster. However, nothing worked remotely as well as the batch_mutate when it came to writing a full wide-row at once. I think Cassandra 2.0 makes CQL work better for these cases (CASSANDRA-4693), but I haven't tested it yet. -Paul -Original Message- From: Keith Freeman [mailto:8fo...@gmail.com] Sent: Wednesday, September 11, 2013 1:06 PM To: user@cassandra.apache.org Subject: Re: heavy insert load overloads CPUs, with MutationStage pending Thanks, I had seen your stackoverflow post. I've got hundreds of (wide-) rows, and the writes are pretty well distributed across them. I'm very reluctant to drop back to the thrift interface. On 09/11/2013 10:46 AM, Paul Cichonski wrote: How much of the data you are writing is going against the same row key? I've experienced some issues using CQL to write a full wide-row at once (across multiple threads) that exhibited some of the symptoms you have described (i.e., high cpu, dropped mutations). This question goes into it a bit more:http://stackoverflow.com/questions/18522191/using-cassandra-and- cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque . I was able to solve my issue by switching to using the thrift batch_mutate to write a full wide-row at once instead of using many CQL INSERT statements. -Paul -Original Message- From: Keith Freeman [
Re: heavy insert load overloads CPUs, with MutationStage pending
Ok, your results are pretty impressive, I'm giving it a try. I've made some initial attempts to use Astyanax 1.56.37, but have some troubles: - it's not compatible with 1.2.8 client-side ( NoSuchMethodError's on org.apache.cassandra.thrift.TBinaryProtocol, which changed it's signature since 1.2.5) - even switching to C* 1.2.5 servers, it's been difficult getting simple examples to work unless I use CF's that have "WITH COMPACT STORAGE" How did you handle these problems? How much effort did it take you to switch from datastax to astyanax? I feel like I'm getting lost in a pretty deep rabbit-hole here. On 09/11/2013 03:03 PM, Paul Cichonski wrote: I was reluctant to use the thrift as well, and I spent about a week trying to get the CQL inserts to work by partitioning the INSERTS in different ways and tuning the cluster. However, nothing worked remotely as well as the batch_mutate when it came to writing a full wide-row at once. I think Cassandra 2.0 makes CQL work better for these cases (CASSANDRA-4693), but I haven't tested it yet. -Paul -Original Message- From: Keith Freeman [mailto:8fo...@gmail.com] Sent: Wednesday, September 11, 2013 1:06 PM To: user@cassandra.apache.org Subject: Re: heavy insert load overloads CPUs, with MutationStage pending Thanks, I had seen your stackoverflow post. I've got hundreds of (wide-) rows, and the writes are pretty well distributed across them. I'm very reluctant to drop back to the thrift interface. On 09/11/2013 10:46 AM, Paul Cichonski wrote: How much of the data you are writing is going against the same row key? I've experienced some issues using CQL to write a full wide-row at once (across multiple threads) that exhibited some of the symptoms you have described (i.e., high cpu, dropped mutations). This question goes into it a bit more:http://stackoverflow.com/questions/18522191/using-cassandra-and- cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque . I was able to solve my issue by switching to using the thrift batch_mutate to write a full wide-row at once instead of using many CQL INSERT statements. -Paul -Original Message- From: Keith Freeman [mailto:8fo...@gmail.com] Sent: Wednesday, September 11, 2013 9:16 AM To:user@cassandra.apache.org Subject: Re: heavy insert load overloads CPUs, with MutationStage pending On 09/10/2013 11:42 AM, Nate McCall wrote: With SSDs, you can turn up memtable_flush_writers - try 3 initially (1 by default) and see what happens. However, given that there are no entries in 'All time blocked' for such, they may be something else. Tried that, it seems to have reduced the loads a little after everything warmed-up, but not much. How are you inserting the data? A java client on a separate box using the datastax java driver, 48 threads writing 100 records each iteration as prepared batch statements. At 5000 records/sec, the servers just can't keep up, so the client backs up. That's only 5M of data/sec, which doesn't seem like much. As I mentioned, switching to SSDs didn't help much, so I'm assuming at this point that the server overloads are what's holding up the client.