In place vnode conversion possible?

2014-12-16 Thread Jonas Borgström
Hi, I know that adding a new vnode enabled DC is the recommended method to convert and existing cluster to vnode. And that the cassandra-shuffle utility has been removed. That said, I've done some testing and it appears to be possible to perform an in place conversion as long as all nodes contain

Understanding what is key and partition key

2014-12-16 Thread Chamila Wijayarathna
Hello all, I have read a lot about Cassandra and I read about key-value pairs, partition keys, clustering keys, etc.. Is key mentioned in key-value pair and partition key refers to same or are they different? CREATE TABLE corpus.bigram_time_category_ordered_frequency ( id bigint, word1 va

Re: Understanding what is key and partition key

2014-12-16 Thread Jack Krupansky
Correction: year and category form a “composite partition key”. frequency, word1, and word2 are “clustering columns”. The combination of a partition key with clustering columns is a “compound primary key”. Every CQL row will have a partition key by definition, and may optionally have clusterin

Re: Understanding what is key and partition key

2014-12-16 Thread Chamila Wijayarathna
Hi Jack, So what will be the keys and values of the following CF instance? year | category | frequency | word1| word2 | id --+--+---+--+-+--- 2014 |N | 1 |සියළුම | යුද්ධ | 664 2014 |

Re: Understanding what is key and partition key

2014-12-16 Thread Jens Rantil
For the first row, the key is: (2014, N, 1, සියළුම, යුද්ධ) and the value-part is (664). Cheers, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Tue, Dec 16, 2014 at 2:25 PM, Chamila Wijayarathn

Re: Understanding what is key and partition key

2014-12-16 Thread Chamila Wijayarathna
Hi Jens, Thank You! On Tue, Dec 16, 2014 at 7:03 PM, Jens Rantil wrote: > > For the first row, the key is: (2014, N, 1, සියළුම, යුද්ධ) and the > value-part is (664). > > Cheers, > Jens > > ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se > Phone: +46 708 84 18 32 Web: www.tink

Defining DataSet.json for cassandra-unit testing

2014-12-16 Thread Chamila Wijayarathna
Hello all, I am trying to test my application using cassandra-unit with following schema and data given below. CREATE TABLE corpus.bigram_time_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, year int, category varchar, frequency int, PRIMARY KEY((

Re: batch_size_warn_threshold_in_kb

2014-12-16 Thread Eric Stevens
> You are, of course, free to use batches in your application I'm not looking to justify the use of batches, I'm looking for the path forward that will give us the Best Results™ both near and long term, for some definition of Best (which would be a balance of client throughput and cluster pressure

does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Ian Rose
Howdy all, Our use of cassandra unfortunately makes use of lots of deletes. Yes, I know that C* is not well suited to this kind of workload, but that's where we are, and before I go looking for an entirely new data layer I would rather explore whether C* could be tuned to work well for us. Howev

Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Eric Stevens
No, deletes are always written as a tombstone no matter the consistency. This is because data at rest is written to sstables which are immutable once written. The tombstone marks that a record in another sstable is now deleted, and so a read of that value should be treated as if it doesn't exist.

Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Robert Wille
Tombstones have to be created. The SSTables are immutable, so the data cannot be deleted. Therefore, a tombstone is required. The value you deleted will be physically removed during compaction. My workload sounds similar to yours in some respects, and I was able to get C* working for me. I have

Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Ian Rose
Ah, makes sense. Thanks for the explanations! - Ian On Tue, Dec 16, 2014 at 10:53 AM, Robert Wille wrote: > > Tombstones have to be created. The SSTables are immutable, so the data > cannot be deleted. Therefore, a tombstone is required. The value you > deleted will be physically removed duri

Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Jack Krupansky
When you say “no need for tombstones”, did you actually read that somewhere or were you just speculating? If the former, where exactly? -- Jack Krupansky From: Ian Rose Sent: Tuesday, December 16, 2014 10:22 AM To: user Subject: does consistency=ALL for deletes obviate the need for tombstones?

Re: Hinted handoff not working

2014-12-16 Thread Robert Wille
Nope. I added millions of records and several GB to the cluster while one node was down, and then ran "nodetool flush system hints" on a couple of nodes that were up, and system/hints has less than 200K in it. Here’s the relevant part of "nodetool cfstats system.hints": Keyspace: system

Re: Cassandra Maintenance Best practices

2014-12-16 Thread Neha Trivedi
Hi Jonathan,QUORUM = (sum_of_replication_factors / 2) + 1, For us Quorum = (2/2) +1 = 2. Default CL is ONE and RF=2 with Two Nodes in the cluster.(I am little confused, what is my read CL and what is my WRITE CL?) So, does it mean that for every WRITE it will write in both the nodes? and For eve

Comprehensive documentation on Cassandra Data modelling

2014-12-16 Thread Jason Kania
Hi, I have been having a few exchanges with contributors to the project around what is possible with Cassandra and a common response that comes up when I describe functionality as broken or missing is that I am not modelling my data correctly. Unfortunately, I cannot seem to find comprehensive d

Re: Cassandra Maintenance Best practices

2014-12-16 Thread Ryan Svihla
CL quorum with RF2 is equivalent to ALL, writes will require acknowledgement from both nodes, and reads will be from both nodes. CL one will write to both replicas, but return success as soon as the first one responds, read will be from one node ( load balancing strategy determines which one). FW

Re: Cassandra Maintenance Best practices

2014-12-16 Thread Neha Trivedi
Thanks Ryan. So, as Jonathan recommended, we should have RF=3 with Three nodes. So Quorum = 2 so, CL= 2 (or I need the CL to be set to two) and I will not need the downgrading retry policy, in case if my one node goes down. I can dynamically add a New node to my Cluster. Can I change my RF to 3,

Re: Cassandra Maintenance Best practices

2014-12-16 Thread Ryan Svihla
you'll have to run repair and that will involve some load and streaming, but this is a normal use case for cassandra..and your cluster should be sized load wise to allow repair, and bootstrapping of new nodes..otherwise when you're over whelmed you won't be able to add more nodes easily. If you ne

Re: Cassandra Maintenance Best practices

2014-12-16 Thread Neha Trivedi
thanks Ryan.. We will get a new node and add it in the cluster. I will mail if I have any question regarding the same. On Tue, Dec 16, 2014 at 10:52 PM, Ryan Svihla wrote: > > you'll have to run repair and that will involve some load and streaming, > but this is a normal use case for cassandra..a

Re: Comprehensive documentation on Cassandra Data modelling

2014-12-16 Thread Ryan Svihla
Data Modeling a distributed application could be a book unto itself. However, I will add, modeling by restriction is basically the entire thought process in Cassandra data modeling since it's a distributed hash table and a core aspect of that sort of application is you need to be able to quickly lo

Re: Defining DataSet.json for cassandra-unit testing

2014-12-16 Thread Ryan Svihla
I'd ask the author of cassandra-unit. I've not personally used that project. On Tue, Dec 16, 2014 at 8:00 AM, Chamila Wijayarathna < cdwijayarat...@gmail.com> wrote: > > Hello all, > > I am trying to test my application using cassandra-unit with following > schema and data given below. > > CREATE

Re: Changing replication factor of Cassandra cluster

2014-12-16 Thread Ryan Svihla
Repair's performance is going to vary heavily by a large number of factors, hours for 1 node to finish is within range of what I see in the wild, again there are so many factors it's impossible to speculate on if that is good or bad for your cluster. Factors that matter include: 1. speed of dis

Re: Comprehensive documentation on Cassandra Data modelling

2014-12-16 Thread Jason Kania
Ryan, Thanks for the response. It offers a bit more clarity. I think a series of blog posts with good real world examples would go a long way to increasing usability of Cassandra. Right now I find the process like going through a mine field because I only discover what is not possible after tryi

Re: Comprehensive documentation on Cassandra Data modelling

2014-12-16 Thread Ryan Svihla
There is a lot of stuff out there and the best thing you can do today is watch Patrick McFadden's series. This is was what I used before I started at DataStax. Planet Cassandra has a data modeling playlist of videos you can watch https://www.youtube.com/playlist?list=PLqcm6qE9lgKJoSWKYWHWhrVupRbS8

Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Ian Rose
I was speculating. From the responses above, it now appears to me that tombstones serve (at least) 2 distinct roles: 1. When reading within a single cassandra instance, they mark a new version of a value (that value being "deleted"). Without this, the prior version would be the most recent and s

100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
I have a three node cluster that has been sitting at a load of 4 (for each node), 100% CPI utilization (although 92% nice) for that last 12 hours, ever since some significant writes finished. I'm trying to determine what tuning I should be doing to get it out of this state. The debug log is just an

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Jonathan Lacefield
Hello, What version of Cassandra are you running? If it's 2.0, we recently experienced something similar with 8447 [1], which 8485 [2] should hopefully resolve. Please note that 8447 is not related to tombstones. Tombstone processing can put a lot of pressure on the heap as well. Why do y

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
What's heap usage at? On Tue, Dec 16, 2014 at 1:04 PM, Arne Claassen wrote: > > I have a three node cluster that has been sitting at a load of 4 (for each > node), 100% CPI utilization (although 92% nice) for that last 12 hours, > ever since some significant writes finished. I'm trying to determi

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
I'm running 2.0.10. The data is all time series data and as we change our pipeline, we've been periodically been reprocessing the data sources, which causes each time series to be overwritten, i.e. every row per partition key is deleted and re-written, so I assume i've been collecting a bunch of t

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
What's CPU, RAM, Storage layer, and data density per node? Exact heap settings would be nice. In the logs look for TombstoneOverflowingException On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen wrote: > > I'm running 2.0.10. > > The data is all time series data and as we change our pipeline, we've

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we might go c3.2xlarge instead if CPU is more important than RAM Storage is optimized EBS SSD (but iostat shows no real IO going on) Each node only has about 10GB with ownership of 67%, 64.7% & 68.3%. The node on which I set the H

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Sorry, I meant 15GB heap on the one machine that has less nice CPU% now. The others are 6GB On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen wrote: > > AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we > might go c3.2xlarge instead if CPU is more important than RAM > Storage i

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now. Checked my dev cluster to see if the ParNew log entries are just par for the course, but not seeing them there. However, both have the following every 30 seconds: DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManage

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
So heap of that size without some tuning will create a number of problems (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew (which I'd only set that low for that low cpu count) , or attempt the tunings as indicated in https://issues.apache.org/jira/browse/CASSANDRA-8150 On T

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
also based on replayed batches..are you using batches to load data? On Tue, Dec 16, 2014 at 3:12 PM, Ryan Svihla wrote: > > So heap of that size without some tuning will create a number of problems > (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew > (which I'd only set th

Best Time Series insert strategy

2014-12-16 Thread Arne Claassen
I have a time series table consisting of frame information for media. The table is partitioned on the media ID and uses time and some other frame level keys as cluster keys, i.e. all frames for a one piece of media is really one column family "row", even though it is represented in CQL as a ordered

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
The starting configuration I had, which is still running on two of the nodes, was 6GB Heap, 1024MB parnew which is close to what you are suggesting and those have been pegged at load 4 for the over 12 hours with hardly and read or write traffic. I will set one to 8GB/400MB and see if its load chang

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly enough to run Cassandra well in, especially if you're going full bore on loads. However, you maybe just flat out be CPU bound on your write throughput, how many TPS and what size writes do you have? Also what is your widest row?

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Actually not sure why the machine was originally configured at 6GB since we even started it on an r3.large with 15GB. Re: Batches Not using batches. I actually have that as a separate question on the list. Currently I fan out async single inserts and I'm wondering if batches are better since my d

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
Can you define what is "virtual no traffic" sorry to be repetitive about that, but I've worked on a lot of clusters in the past year and people have wildly different ideas what that means. unlogged batches of the same partition key are definitely a performance optimization. Typically async is much

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
No problem with the follow up questions. I'm on a crash course here trying to understand what makes C* tick so I appreciate all feedback. We reprocessed all media (1200 partition keys) last night where partition keys had somewhere between 4k and 200k "rows". After that completed, no traffic went t

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
Ok based on those numbers I have a theory.. can you show me nodetool tptats for all 3 nodes? On Tue, Dec 16, 2014 at 4:04 PM, Arne Claassen wrote: > > No problem with the follow up questions. I'm on a crash course here trying > to understand what makes C* tick so I appreciate all feedback. > > W

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Of course QA decided to start a test batch (still relatively low traffic), so I hope it doesn't throw the tpstats off too much Node 1: Pool NameActive Pending Completed Blocked All time blocked MutationStage 0 0 13804928 0

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
so you've got some blocked flush writers but you have a incredibly large number of dropped mutations, are you using secondary indexes? and if so how many? what is your flush queue set to? On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen wrote: > > Of course QA decided to start a test batch (still r

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Not using any secondary indicies and memtable_flush_queue_size is the default 4. But let me tell you how data is "mutated" right now, maybe that will give you an insight on how this is happening Basically the frame data table has the following primary key: PRIMARY KEY ((id), trackid, "timestamp

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
so a delete is really another write for gc_grace_seconds (default 10 days), if you get enough tombstones it can make managing your cluster a challenge as is. open up cqlsh, turn on tracing and try a few queries..how many tombstones are scanned for a given query? It's possible the heap problems you'

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
I just did a wide set of selects and ran across no tombstones. But while on the subject of gc_grace_seconds, any reason, on a small cluster not to set it to something low like a single day. It seems like 10 days is only need to large clusters undergoing long partition splits, or am i misundersta

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
manual forced compactions create more problems than they solve, if you have no evidence of tombstones in your selects (which seems odd, can you share some of the tracing output?), then I'm not sure what it would solve for you. Compaction running could explain a high load, logs messages with ERRORS

Questions about bootrapping and compactions during bootstrapping

2014-12-16 Thread Donald Smith
Looking at the output of "nodetool netstats" I see that the bootstrapping nodes pulling from only two of the nine nodes currently in the datacenter. That surprises me: I'd think the vnodes it pulls from would be randomly spread across the existing nodes. We're using Cassandra 2.0.11 with 256

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
That's just the thing. There is nothing in the logs except the constant ParNew collections like DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634888 But the load is staying continuously high. T

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
What version of Cassandra? On Dec 16, 2014 6:36 PM, "Arne Claassen" wrote: > That's just the thing. There is nothing in the logs except the constant > ParNew collections like > > DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line > 118) GC for ParNew: 166 ms for 10 collection

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Cassandra 2.0.10 and Datastax Java Driver 2.1.1 On Dec 16, 2014, at 4:48 PM, Ryan Svihla wrote: > What version of Cassandra? > > On Dec 16, 2014 6:36 PM, "Arne Claassen" wrote: > That's just the thing. There is nothing in the logs except the constant > ParNew collections like > > DEBUG [Sche

[Consitency on cqlsh command prompt]

2014-12-16 Thread nitin padalia
Hi, When I set Consistency to QUORUM in cqlsh command line. It says consistency is set to quorum. cqlsh:testdb> CONSISTENCY QUORUM ; Consistency level set to QUORUM. However when I check it back using CONSISTENCY command on the prompt it says consistency is 4. However it should be 2 as my replic

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Jens Rantil
Maybe checking which thread(s) would hint what's going on? (see http://www.boxjar.com/using-top-and-jstack-to-find-the-java-thread-that-is-hogging-the-cpu/). On Wed, Dec 17, 2014 at 1:51 AM, Arne Claassen wrote: > Cassandra 2.0.10 and Datastax Java Driver 2.1.1 > On Dec 16, 2014, at 4:48 PM, Ry