Re: Routine nodetool repair
Here you go http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 21/12/2011, at 2:44 PM, Blake Starkenburg wrote: > I have been playing around with Cassandra for a few months now. Starting to > explore more of the routine maintenance and backup strategies and I have a > general question about nodetool repair. After reading the following page: > http://www.datastax.com/docs/0.8/operations/cluster_management it has > occurred to me that for these past few months I have NOT DONE any cleanup or > repair commands on a test 2-node cluster (and their has been quite a few > deletes, writes, etc.). > > For some reason I was under the assumption that Cassandra handled the > tombstone records from deletes automatically? Should I still run nodetool > repair and if so, what about old deletes which occurred months ago? > > Thank You!
Re: Can I slice on composite indexes?
You can slice the "key1" row to get the columns that have "xyz" as the value for the first component in the column name. Check the docs in your client for how to do that. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 21/12/2011, at 3:04 PM, Maxim Potekhin wrote: > Let's say I have rows with composite columns Like > > ("key1", {('xyz', 'abc'): 'colval1'}, {('xyz', 'def'): 'colval2'}) > ("key2", {('ble', 'meh'): 'otherval'}) > > Is it possible to create a composite type index such that I can query on 'xyz' > and get the first two columns? > > Thanks > > Maxim >
Re: Doubts related to composite type column names/values
On Tue, Dec 20, 2011 at 9:33 PM, Maxim Potekhin wrote: > Thank you Aaron! As long as I have plain strings, would you say that I would > do almost as well with catenation? Not without a concatenation aware comparator. The padding aaron is talking of is not a mixed type problem only. What I mean here is that if you use a simple string comparator (UTF8Type, AsciiType or even BytesType), then you will have the following sorting: "foo24:bar" "foo:bar" "foobar:bar" because ':' is between '2' and 'b' in ascii, you could use another separator but you get the point. In other words, concatenating strings doesn't make the comparator aware of that fact. CompositeType on the other hand sorts each component separately, so it will sort: "foo" : "bar" "foo24" : "bar" "foobar" : "bar" which is usually what you want. -- Sylvain > > Of course I realize that mixed types are a very different case where the > composite is very useful. > > Thanks > > Maxim > > > > On 12/20/2011 2:44 PM, aaron morton wrote: > > Component values are compared in a type aware fashion, an Integer is an > Integer. Not a 10 character zero padded string. > > You can also slice on the components. Just like with string concat, but > nicer. . e.g. If you app is storing comments for a thing, and the column > names have the form or you can slice > for all properties of a comment or all properties for comments between two > comment_id's > > Finally, the client library knows what's going on. > > Hope that helps. > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 21/12/2011, at 7:43 AM, Maxim Potekhin wrote: > > With regards to static, what are major benefits as it compares with > string catenation (with some convenient separator inserted)? > > Thanks > > Maxim > > > On 12/20/2011 1:39 PM, Richard Low wrote: > > On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lew wrote: > > With regard to the composite columns stuff in Cassandra, I have the > > following doubts : > > > 1. What is the storage overhead of the composite type column names/values, > > The values are the same. For each dimension, there is 3 bytes overhead. > > > 2. what exactly is the difference between the DynamicComposite and Static > > Composite ? > > Static composite type has the types of each dimension specified in the > > column family definition, so all names within that column family have > > the same type. Dynamic composite type lets you specify the type for > > each column, so they can be different. There is extra storage > > overhead for this and care must be taken to ensure all column names > > remain comparable. > > > > >
Re: Composite Column Question
On Wed, Dec 21, 2011 at 7:04 AM, Martin Arrowsmith wrote: > Dear Cassandra Experts, > > Are the number of composite attributes fixed for each column family ? > > I have been doing : "create column family MyCF with comparator = > 'CompositeType(IntegerType, UTF8Type)' > > And this creates a composite { integer:string } > > Hector complains when I give a 3rd attribute. If you use CompositeType(IntegerType, UTF8Type) as comparator, you cannot add a column with a name that has 3 components, because the comparator wouldn't know how to compare that 3rd component. However, the comparator is fine with you adding columns whose name doesn't have all of the specified components. In other words, if you know in advance that you may need up to 3 attributes (and you know their type), you can declare all them at first but only use the first components on some column name if needed. Now that's obviously a bit restrictive. In theory, it could be possible to update the comparator definition from CompositeType(IntegerType, UTF8Type) to say CompositeType(IntegerType, UTF8Type, UUIDType) at a later time but it is not possible right now because C* doesn't allow any change of comparator. We should (and may) allow such valid change at some point in the future. Now, there also exists a DynamicCompositeType, that don't put any restrictions on the number and type of component it uses. The reason I'm mentioning it only at the end however is that it is trickier to use and have some overhead over CompositeType. I'm also not sure how much support hector have for this comparator. I would avoid using DynamicCompositeType unless you know that CompositeType really doesn't work for you. -- Sylvain > > If "unstatic" composite columns are possible, what would be the CLI command > to create such a column family, and how can it be implemented ? > > Best wishes, > > Martin >
Handling topology changes
Hi, I wonder about the best strategy when a cluster sees changes in its topology relatively often. My main concern is how to handle the initial token for new nodes. If a cluster is first created with 7 nodes for which the initial token is calculated with the formula here: http://wiki.apache.org/cassandra/Operations#Token_selection. It seems as though two strategies can be applied: * having a fixed amount of nodes with initial tokens and letting new ones auto bootstrap themselves * recomputing tokens for the new number of nodes and using nodetool move for each of those Is any of these right and are there other strategies ? - pyr
Re: Doubts related to composite type column names/values
Is it true that you can also just get the same results as when you pick a UTF8 key with this content: keyA:keyB Of should you really use the composite keys? If so, what is the big advantage of composite over combined utf-8 keys? Robin 2011/12/21 Sylvain Lebresne > On Tue, Dec 20, 2011 at 9:33 PM, Maxim Potekhin wrote: > > Thank you Aaron! As long as I have plain strings, would you say that I > would > > do almost as well with catenation? > > Not without a concatenation aware comparator. The padding aaron is talking > of > is not a mixed type problem only. What I mean here is that if you use a > simple > string comparator (UTF8Type, AsciiType or even BytesType), then you will > have > the following sorting: > "foo24:bar" > "foo:bar" > "foobar:bar" > because ':' is between '2' and 'b' in ascii, you could use another > separator but > you get the point. In other words, concatenating strings doesn't make the > comparator aware of that fact. > CompositeType on the other hand sorts each component separately, so it will > sort: > "foo" : "bar" > "foo24" : "bar" > "foobar" : "bar" > which is usually what you want. > > -- > Sylvain > > > > > Of course I realize that mixed types are a very different case where the > > composite is very useful. > > > > Thanks > > > > Maxim > > > > > > > > On 12/20/2011 2:44 PM, aaron morton wrote: > > > > Component values are compared in a type aware fashion, an Integer is an > > Integer. Not a 10 character zero padded string. > > > > You can also slice on the components. Just like with string concat, but > > nicer. . e.g. If you app is storing comments for a thing, and the column > > names have the form or you can > slice > > for all properties of a comment or all properties for comments between > two > > comment_id's > > > > Finally, the client library knows what's going on. > > > > Hope that helps. > > > > - > > Aaron Morton > > Freelance Developer > > @aaronmorton > > http://www.thelastpickle.com > > > > On 21/12/2011, at 7:43 AM, Maxim Potekhin wrote: > > > > With regards to static, what are major benefits as it compares with > > string catenation (with some convenient separator inserted)? > > > > Thanks > > > > Maxim > > > > > > On 12/20/2011 1:39 PM, Richard Low wrote: > > > > On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lew wrote: > > > > With regard to the composite columns stuff in Cassandra, I have the > > > > following doubts : > > > > > > 1. What is the storage overhead of the composite type column > names/values, > > > > The values are the same. For each dimension, there is 3 bytes overhead. > > > > > > 2. what exactly is the difference between the DynamicComposite and Static > > > > Composite ? > > > > Static composite type has the types of each dimension specified in the > > > > column family definition, so all names within that column family have > > > > the same type. Dynamic composite type lets you specify the type for > > > > each column, so they can be different. There is extra storage > > > > overhead for this and care must be taken to ensure all column names > > > > remain comparable. > > > > > > > > > > >
Creating column families per client
Hello, I am evaluating the usage of cassandra for my system. I will have several clients who won't share data with each other. My idea is to create one column family per client. When a new client comes in and adds data to the system, I'd like to create a column family dynamically. Is that reliable? Can I create a column family on a node and imediately add new data on that column family and be confident that the data added will eventually become visible to a read? []'s Rafael
Re: Creating column families per client
Hi, I don't know if this will be technically possible, but I just want to warn you about creating a lot of column families. When you will have a lot clients, you will have a lot of column families and, if I'm right, each column family uses memory on every node. You will run out of memory very fast, because of the non horizontal growth of the memory usage per column family (adding nodes won't resolve the problem, you will have to upgrade hardware on existing nodes, which is not Cassandra's philosophy). I think you should use rows or columns inside your column families as much as possible unless you have really a few clients. Once again, I'm not a Cassandra expert. I might be wrong :-). Alain 2011/12/21 Rafael Almeida > Hello, > > I am evaluating the usage of cassandra for my system. I will have several > clients who won't share data with each other. My idea is to create one > column family per client. When a new client comes in and adds data to the > system, I'd like to create a column family dynamically. Is that reliable? > Can I create a column family on a node and imediately add new data on that > column family and be confident that the data added will eventually become > visible to a read? > > []'s > Rafael > > >
Re: Creating column families per client
Every node? I hadn't realized that. Is there a place where i can compute how much memory is being 'wasted' ? Le 21 déc. 2011 15:09, "Alain RODRIGUEZ" a écrit : > Hi, I don't know if this will be technically possible, but I just want to > warn you about creating a lot of column families. When you will have a lot > clients, you will have a lot of column families and, if I'm right, each > column family uses memory on every node. You will run out of memory very > fast, because of the non horizontal growth of the memory usage per column > family (adding nodes won't resolve the problem, you will have to upgrade > hardware on existing nodes, which is not Cassandra's philosophy). > > I think you should use rows or columns inside your column families as much > as possible unless you have really a few clients. > > Once again, I'm not a Cassandra expert. I might be wrong :-). > > Alain > > 2011/12/21 Rafael Almeida > >> Hello, >> >> I am evaluating the usage of cassandra for my system. I will have several >> clients who won't share data with each other. My idea is to create one >> column family per client. When a new client comes in and adds data to the >> system, I'd like to create a column family dynamically. Is that reliable? >> Can I create a column family on a node and imediately add new data on that >> column family and be confident that the data added will eventually become >> visible to a read? >> >> []'s >> Rafael >> >> >> >
Re: Creating column families per client
Hi, based on my experience with Cassandra 0.7.4, i strongly discourage you to do that: we tried dynamical creation of column families, and it was a nightmare. First of all, the operation can not be done concurrently, therefore you must find a way to avoid parallel creation (over all the cluster, not in a single node). The main problem however is with timestamps. The structure of your keyspace is versioned with a time-dependent id, which is assigned by the host where you perform the schema update based on the local machine time. If you do two updates in close succession on two different nodes, and their clocks are not perfectly synchronized (and they will never be), Cassandra might be confused by their relative ordering, and stop working altogether. Bottom line: don't. Flavio Il 12/21/2011 14:45 PM, Rafael Almeida ha scritto: Hello, I am evaluating the usage of cassandra for my system. I will have several clients who won't share data with each other. My idea is to create one column family per client. When a new client comes in and adds data to the system, I'd like to create a column family dynamically. Is that reliable? Can I create a column family on a node and imediately add new data on that column family and be confident that the data added will eventually become visible to a read? []'s Rafael
Re: Routine nodetool repair
Thank You! Could the lack of routine repair be why nodetool ring reports: node(1) Load -> 78.24 MB and node(2) Load -> 67.21 MB? The load span between the two nodes has been increasing ever so slowly... On Wed, Dec 21, 2011 at 1:00 AM, aaron morton wrote: > Here you go > http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 21/12/2011, at 2:44 PM, Blake Starkenburg wrote: > > I have been playing around with Cassandra for a few months now. Starting > to explore more of the routine maintenance and backup strategies and I have > a general question about nodetool repair. After reading the following page: > http://www.datastax.com/docs/0.8/operations/cluster_management it has > occurred to me that for these past few months I have NOT DONE any cleanup > or repair commands on a test 2-node cluster (and their has been quite a few > deletes, writes, etc.). > > For some reason I was under the assumption that Cassandra handled the > tombstone records from deletes automatically? Should I still run nodetool > repair and if so, what about old deletes which occurred months ago? > > Thank You! > > >
Re: questions on datastax opscenter
On Tue, Dec 20, 2011 at 1:49 PM, Feng Qu wrote: > I have two questions for community version of opscenter > > 1) does it work with multiple cassandra cluster? Tyler answered the second part. As for managing multiple clusters from a single opscenter: not yet, but it's on our radar. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Creating column families per client
What we have done to avoid creating multiple column families is to sort of namespace the row key. So if we have a column family of Users and accounts: "AccountA" and "AccountB", we do the following: Column Family User: "AccountA/ryan" : { first: Ryan, last: Lowe } "AccountB/ryan" : { first: Ryan, last: Smith} etc. For our needs, this did the same thing as having 2 "User" column families for "AccountA" and "AccountB" Ryan On Wed, Dec 21, 2011 at 10:34 AM, Flavio Baronti wrote: > Hi, > > based on my experience with Cassandra 0.7.4, i strongly discourage you to > do that: we tried dynamical creation of column families, and it was a > nightmare. > First of all, the operation can not be done concurrently, therefore you > must find a way to avoid parallel creation (over all the cluster, not in a > single node). > The main problem however is with timestamps. The structure of your > keyspace is versioned with a time-dependent id, which is assigned by the > host where you perform the schema update based on the local machine time. > If you do two updates in close succession on two different nodes, and their > clocks are not perfectly synchronized (and they will never be), Cassandra > might be confused by their relative ordering, and stop working altogether. > > Bottom line: don't. > > Flavio > > Il 12/21/2011 14:45 PM, Rafael Almeida ha scritto: > > Hello, >> >> I am evaluating the usage of cassandra for my system. I will have several >> clients who won't share data with each other. My idea is to create one >> column family per client. When a new client comes in and adds data to the >> system, I'd like to create a column family dynamically. Is that reliable? >> Can I create a column family on a node and imediately add new data on that >> column family and be confident that the data added will eventually become >> visible to a read? >> >> []'s >> Rafael >> >> >> >> >
Re: Creating column families per client
The overhead for column families was greatly reduced in 0.8 and 1.0. It should now be possible to have hundreds or thousands of column families. The setting 'memtable_total_space_in_mb' was introduced that allows for a global memtable threshold, and cassandra will handle flushing on its own. See http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management Another thing you should consider is the lack of built in access controls. There is an authentication/authorization interface you can plug in to and examples in the examples/ directory of the source download. On Wed, Dec 21, 2011 at 10:36 AM, Ryan Lowe wrote: > What we have done to avoid creating multiple column families is to sort of > namespace the row key. So if we have a column family of Users and accounts: > "AccountA" and "AccountB", we do the following: > > Column Family User: > "AccountA/ryan" : { first: Ryan, last: Lowe } > "AccountB/ryan" : { first: Ryan, last: Smith} > > etc. > > For our needs, this did the same thing as having 2 "User" column families > for "AccountA" and "AccountB" > > Ryan > > > On Wed, Dec 21, 2011 at 10:34 AM, Flavio Baronti > wrote: >> >> Hi, >> >> based on my experience with Cassandra 0.7.4, i strongly discourage you to >> do that: we tried dynamical creation of column families, and it was a >> nightmare. >> First of all, the operation can not be done concurrently, therefore you >> must find a way to avoid parallel creation (over all the cluster, not in a >> single node). >> The main problem however is with timestamps. The structure of your >> keyspace is versioned with a time-dependent id, which is assigned by the >> host where you perform the schema update based on the local machine time. If >> you do two updates in close succession on two different nodes, and their >> clocks are not perfectly synchronized (and they will never be), Cassandra >> might be confused by their relative ordering, and stop working altogether. >> >> Bottom line: don't. >> >> Flavio >> >> Il 12/21/2011 14:45 PM, Rafael Almeida ha scritto: >> >>> Hello, >>> >>> I am evaluating the usage of cassandra for my system. I will have several >>> clients who won't share data with each other. My idea is to create one >>> column family per client. When a new client comes in and adds data to the >>> system, I'd like to create a column family dynamically. Is that reliable? >>> Can I create a column family on a node and imediately add new data on that >>> column family and be confident that the data added will eventually become >>> visible to a read? >>> >>> []'s >>> Rafael >>> >>> >>> >> >
Re: Counter read requests spread across replicas ?
How many rows are you asking for in the multget_slice and what thread pools are showing pending tasks ? Also, what happens when you reduce the number of rows in the request? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 21/12/2011, at 11:57 AM, Philippe wrote: > Hello, > 5 nodes running 0.8.7/0.8.9, RF=3, BOP, counter columns inside super columns. > Read queries are multigetslices of super columns inside of which I read every > column for processing (20-30 at most), using Hector with default settings. > Watching tpstat on the 3 nodes holding the data being most often queries, I > see the pending count increase only on the "main replica" and I see heavy CPU > load and network load only on that node. The other nodes seem to be doing > very little. > > Aren't counter read requests supposed to be round-robin across replicas ? I'm > confused as to why the nodes don't exhibit the same load. > > Thanks
Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys
AFAIK there are no plans kill the BOP, but I would still try to make your life easier by using the RP. . My understanding of the problem is at certain times you snapshot the files in a dir; and the main query you want to handle is "At what points between time t0 and time t1 did files x,y and z exist?". You could consider: 1) Partitioning the time series data in across each row, then make the row key is the timestamp for the start of the partition. If you have rollup partitions consider making the row key , e.g. <123456789."1d"> for a 1 day partition that starts at 123456789 2) In each row use column names that have the form where time stamp is the time of the snapshot. To query between two times (t0 and t1): 1) Determine which partitions the time span covers, this will give you a list of rows. 2) Execute a multi-get slice for the all rows using and (I'm using * here as a null, check with your client to see how to use composite columns.) Hope that helps. Aaron - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 21/12/2011, at 9:03 AM, Bryce Allen wrote: > I wasn't aware of CompositeColumns, thanks for the tip. However I think > it still doesn't allow me to do the query I need - basically I need to > do a timestamp range query, limiting only to certain file names at > each timestamp. With BOP and a separate row for each timestamp, > prefixed by a random UUID, and file names as column names, I can do this > query. With CompositeColumns, I can only query one contiguous range, so > I'd have to know the timestamps before hand to limit the file names. I > can resolve this using indexes, but on paper it looks like this would be > significantly slower (it would take me 5 round trips instead of 3 to > complete each query, and the query is made multiple times on every > single client request). > > The two down sides I've seen listed for BOP are balancing issues and > hotspots. I can understand why RP is recommended, from the balancing > issues alone. However these aren't problems for my application. Is > there anything else I am missing? Does the Cassandra team plan on > continuing to support BOP? I haven't completely ruled out RP, but I > like having BOP as an option, it opens up interesting modeling > alternatives that I think have real advantages for some > (if uncommon) applications. > > Thanks, > Bryce > > On Wed, 21 Dec 2011 08:08:16 +1300 > aaron morton wrote: >> Bryce, >> Have you considered using CompositeColumns and a standard CF? >> Row key is the UUID column name is (timestamp : dir_entry) you can >> then slice all columns with a particular time stamp. >> >> Even if you have a random key, I would use the RP unless you >> have an extreme use case. >> >> Cheers >> >> - >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 21/12/2011, at 3:06 AM, Bryce Allen wrote: >> >>> I think it comes down to how much you benefit from row range scans, >>> and how confident you are that going forward all data will continue >>> to use random row keys. >>> >>> I'm considering using BOP as a way of working around the non indexes >>> super column limitation. In my current schema, row keys are random >>> UUIDs, super column names are timestamps, and columns contain a >>> snapshot in time of directory contents, and could be quite large. If >>> instead I use row keys that are (uuid)-(timestamp), and use a >>> standard column family, I can do a row range query and select only >>> specific columns. I'm still evaluating if I can do this with BOP - >>> ideally the token would just use the first 128 bits of the key, and >>> I haven't found any documentation on how it compares keys of >>> different length. >>> >>> Another trick with BOP is to use MD5(rowkey)-rowkey for data that >>> has non uniform row keys. I think it's reasonable to use if most >>> data is uniform and benefits from range scans, but a few things are >>> added that aren't/don't. This trick does make the keys larger, >>> which increases storage cost and IO load, so it's probably a bad >>> idea if a significant subset of the data requires it. >>> >>> Disclaimer - I wrote that wiki article to fill in a documentation >>> gap, since there were no examples of BOP and I wasted a lot of time >>> before I noticed the hex byte array vs decimal distinction for >>> specifying the initial tokens (which to be fair is documented, just >>> easy to miss on a skim). I'm also new to cassandra, I'm just >>> describing what makes sense to me "on paper". FWIW I confirmed that >>> random UUIDs (type 4) row keys really do evenly distribute when >>> using BOP. >>> >>> -Bryce >>> >>> On Mon, 19 Dec 2011 19:01:00 -0800 >>> Drew Kutcharian wrote: Hey Guys, I just came across http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got me thinking. If the row keys are java.util.UUID which are generated r
Re: Handling topology changes
How often is "relatively often" ? > * having a fixed amount of nodes with initial tokens and letting new > ones auto bootstrap themselves Generally a bad idea, you should make sure nodes re given sensible tokens that evenly distribute the data. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 22/12/2011, at 12:34 AM, p...@smallrivers.com wrote: > Hi, > > I wonder about the best strategy when a cluster sees changes in its > topology relatively often. > > My main concern is how to handle the initial token for new nodes. If a > cluster is first created with 7 nodes for which the initial token is > calculated with the formula here: > http://wiki.apache.org/cassandra/Operations#Token_selection. > > It seems as though two strategies can be applied: > > * having a fixed amount of nodes with initial tokens and letting new > ones auto bootstrap themselves > * recomputing tokens for the new number of nodes and using nodetool move > for each of those > > Is any of these right and are there other strategies ? > > - pyr
Re: Doubts related to composite type column names/values
Keys are sorted by their token, when using the RandomPartitioner this is a MD5 hash. So they are essentially randomly sorted. I would use CompositeTypes as keys if they make sense for your app. e.g. you are storing time series data and the row key is the time stamp and the length of the time span. In this case you have a stable known format of . The advantage here is the same as any time you introduce type awareness into a system, somewhere some code notice if you try to store a key of the wrong form. If you have keys that have a variable number of elements, such as a path hierarchy it would not make sense to model that as a CompositeType (IMHO). Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 22/12/2011, at 1:26 AM, R. Verlangen wrote: > Is it true that you can also just get the same results as when you pick a > UTF8 key with this content: > keyA:keyB > > Of should you really use the composite keys? If so, what is the big advantage > of composite over combined utf-8 keys? > > Robin > > 2011/12/21 Sylvain Lebresne > On Tue, Dec 20, 2011 at 9:33 PM, Maxim Potekhin wrote: > > Thank you Aaron! As long as I have plain strings, would you say that I would > > do almost as well with catenation? > > Not without a concatenation aware comparator. The padding aaron is talking of > is not a mixed type problem only. What I mean here is that if you use a simple > string comparator (UTF8Type, AsciiType or even BytesType), then you will have > the following sorting: > "foo24:bar" > "foo:bar" > "foobar:bar" > because ':' is between '2' and 'b' in ascii, you could use another separator > but > you get the point. In other words, concatenating strings doesn't make the > comparator aware of that fact. > CompositeType on the other hand sorts each component separately, so it will > sort: > "foo" : "bar" > "foo24" : "bar" > "foobar" : "bar" > which is usually what you want. > > -- > Sylvain > > > > > Of course I realize that mixed types are a very different case where the > > composite is very useful. > > > > Thanks > > > > Maxim > > > > > > > > On 12/20/2011 2:44 PM, aaron morton wrote: > > > > Component values are compared in a type aware fashion, an Integer is an > > Integer. Not a 10 character zero padded string. > > > > You can also slice on the components. Just like with string concat, but > > nicer. . e.g. If you app is storing comments for a thing, and the column > > names have the form or you can slice > > for all properties of a comment or all properties for comments between two > > comment_id's > > > > Finally, the client library knows what's going on. > > > > Hope that helps. > > > > - > > Aaron Morton > > Freelance Developer > > @aaronmorton > > http://www.thelastpickle.com > > > > On 21/12/2011, at 7:43 AM, Maxim Potekhin wrote: > > > > With regards to static, what are major benefits as it compares with > > string catenation (with some convenient separator inserted)? > > > > Thanks > > > > Maxim > > > > > > On 12/20/2011 1:39 PM, Richard Low wrote: > > > > On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lew wrote: > > > > With regard to the composite columns stuff in Cassandra, I have the > > > > following doubts : > > > > > > 1. What is the storage overhead of the composite type column names/values, > > > > The values are the same. For each dimension, there is 3 bytes overhead. > > > > > > 2. what exactly is the difference between the DynamicComposite and Static > > > > Composite ? > > > > Static composite type has the types of each dimension specified in the > > > > column family definition, so all names within that column family have > > > > the same type. Dynamic composite type lets you specify the type for > > > > each column, so they can be different. There is extra storage > > > > overhead for this and care must be taken to ensure all column names > > > > remain comparable. > > > > > > > > > > >
Re: Routine nodetool repair
Post the output from nodetool ring and take a look at http://wiki.apache.org/cassandra/Operations#Token_selection Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 22/12/2011, at 5:21 AM, Blake Starkenburg wrote: > Thank You! > > Could the lack of routine repair be why nodetool ring reports: node(1) Load > -> 78.24 MB and node(2) Load -> 67.21 MB? The load span between the two nodes > has been increasing ever so slowly... > > On Wed, Dec 21, 2011 at 1:00 AM, aaron morton wrote: > Here you go > http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 21/12/2011, at 2:44 PM, Blake Starkenburg wrote: > >> I have been playing around with Cassandra for a few months now. Starting to >> explore more of the routine maintenance and backup strategies and I have a >> general question about nodetool repair. After reading the following page: >> http://www.datastax.com/docs/0.8/operations/cluster_management it has >> occurred to me that for these past few months I have NOT DONE any cleanup or >> repair commands on a test 2-node cluster (and their has been quite a few >> deletes, writes, etc.). >> >> For some reason I was under the assumption that Cassandra handled the >> tombstone records from deletes automatically? Should I still run nodetool >> repair and if so, what about old deletes which occurred months ago? >> >> Thank You! > >
Re: Counter read requests spread across replicas ?
Hi Aaron, >How many rows are you asking for in the multget_slice and what thread pools are showing pending tasks ? I am querying in batches of 256 keys max. Each batch may slice between 1 and 5 explicit super columns (I need all the columns in each super column, there are at the very most a couple dozen columns per SC). On the first replica, only ReadStage ever shows any pending. All the others have 1 to 10 pending from time to time only. Here's a typical "high pending count" reading on the first replica for the data hotspot. ReadStage13 523810374301128 0 0 I've got a watch running every two seconds and I see the numbers vary every time going from that high point to 0 active, 0 pending. The one thing I've noticed is that I hardly every see the Active count stay up at the current 2s sampling rate. On the 2 other replicas, I hardly ever see any pendings on ReadStage and Active hardly goes up to 1 or 2. But I do see a little PENDING on RequestResponseStage, goes up in the tens or hundreds from time to time. If I'm flooding that one replica, shouldn't the ReadStage Active count be at maximum capacity ? I've already thought of CASSANDRA-2980 but I'm running 0.8.7 and 0.8.9. Also, what happens when you reduce the number of rows in the request? > I've reduced the requests to batches of 16. I've had to increased the number of threads from 30 to 90 in order to get the same key throughput because the throughput I measure drastically goes down on a per thread basis. What I see : - CPU utilization is lower on the first replica (why would that be if the batches are smaller ?) - Pending ReadStage on first replica seems to be staying higher longer. Still goes down to 0 regularly. - lowering to 60 client threads, I see non-zero active MutationStage and ReplicateOnWriteStage more often For our use-case, the higher the throughput per client thread, the less rework will be done in our processing. Another experiment : I stopped the process that does all the reading and a little of the writing. All that's left is a single-threaded process that sending counter updates as fast as it can in batches of up to 50 mutations. First replica : pending counts go up into the low hundreds and back to 0, active up to 3 or 5 and that's a max. Some mutation stage active & pendings => the process is indeed faster at updating the counters so that doesn't surprise me given that a counter write requires a read. Second & third replicas : no read stage pendings at all. A little RequestResponseStage as earlier. Cheers Philippe > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 21/12/2011, at 11:57 AM, Philippe wrote: > > Hello, > 5 nodes running 0.8.7/0.8.9, RF=3, BOP, counter columns inside super > columns. Read queries are multigetslices of super columns inside of which I > read every column for processing (20-30 at most), using Hector with default > settings. > Watching tpstat on the 3 nodes holding the data being most often queries, > I see the pending count increase only on the "main replica" and I see heavy > CPU load and network load only on that node. The other nodes seem to be > doing very little. > > Aren't counter read requests supposed to be round-robin across replicas ? > I'm confused as to why the nodes don't exhibit the same load. > > Thanks > > >
Re: Counter read requests spread across replicas ?
along the same line of the last experimient I did (cluster is only being updated by a single threaded batching processing.) All nodes are the same hardware & configuration. Why on earth would one node require disk IO and not the 2 replicas ? Primary replica show some disk activity (iostat shows about 40%) total-cpu-usage -dsk/total- usr sys idl wai hiq siq| read writ 67 10 19 2 0 3|4244k 364k| where as 2nd & 3rd replica do not total-cpu-usage -dsk/total- usr sys idl wai hiq siq| read writ 42 13 41 0 0 3| 0 0 | 47 15 34 0 0 4|4096B 185k 49 14 35 0 0 3| 0 8192B 47 16 33 0 0 4| 0 4096B 44 13 41 0 0 3| 284k 112k 3rd 11 2 87 1 0 0| 0 136k| 0 0 99 0 0 0| 0 0 9 1 90 0 0 0|4096B 128k 2 2 96 0 0 0| 0 0 0 0 99 0 0 0| 0 0 11 1 87 0 0 0| 0 128k Philippe 2011/12/21 Philippe > Hi Aaron, > > >How many rows are you asking for in the multget_slice and what thread > pools are showing pending tasks ? > I am querying in batches of 256 keys max. Each batch may slice between 1 > and 5 explicit super columns (I need all the columns in each super column, > there are at the very most a couple dozen columns per SC). > > On the first replica, only ReadStage ever shows any pending. All the > others have 1 to 10 pending from time to time only. Here's a typical "high > pending count" reading on the first replica for the data hotspot. > ReadStage13 523810374301128 0 > 0 > I've got a watch running every two seconds and I see the numbers vary > every time going from that high point to 0 active, 0 pending. The one thing > I've noticed is that I hardly every see the Active count stay up at the > current 2s sampling rate. > On the 2 other replicas, I hardly ever see any pendings on ReadStage and > Active hardly goes up to 1 or 2. But I do see a little PENDING > on RequestResponseStage, goes up in the tens or hundreds from time to time. > > > If I'm flooding that one replica, shouldn't the ReadStage Active count be > at maximum capacity ? > > > I've already thought of CASSANDRA-2980 but I'm running 0.8.7 and 0.8.9. > > Also, what happens when you reduce the number of rows in the request? >> > I've reduced the requests to batches of 16. I've had to increased the > number of threads from 30 to 90 in order to get the same key throughput > because the throughput I measure drastically goes down on a per thread > basis. > What I see : > - CPU utilization is lower on the first replica (why would that be if the > batches are smaller ?) > - Pending ReadStage on first replica seems to be staying higher longer. > Still goes down to 0 regularly. > - lowering to 60 client threads, I see non-zero active MutationStage and > ReplicateOnWriteStage more often > For our use-case, the higher the throughput per client thread, the less > rework will be done in our processing. > > Another experiment : I stopped the process that does all the reading and a > little of the writing. All that's left is a single-threaded process that > sending counter updates as fast as it can in batches of up to 50 mutations. > First replica : pending counts go up into the low hundreds and back to 0, > active up to 3 or 5 and that's a max. Some mutation stage active & pendings > => the process is indeed faster at updating the counters so that doesn't > surprise me given that a counter write requires a read. > Second & third replicas : no read stage pendings at all. A > little RequestResponseStage as earlier. > > Cheers > Philippe > >> >> Cheers >> >> - >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 21/12/2011, at 11:57 AM, Philippe wrote: >> >> Hello, >> 5 nodes running 0.8.7/0.8.9, RF=3, BOP, counter columns inside super >> columns. Read queries are multigetslices of super columns inside of which I >> read every column for processing (20-30 at most), using Hector with default >> settings. >> Watching tpstat on the 3 nodes holding the data being most often queries, >> I see the pending count increase only on the "main replica" and I see heavy >> CPU load and network load only on that node. The other nodes seem to be >> doing very little. >> >> Aren't counter read requests supposed to be round-robin across replicas ? >> I'm confused as to why the nodes don't exhibit the same load. >> >> Thanks >> >> >> >
Re: Handling topology changes
A couple of nodes per month, but with peaks. I will test the nodetool move based scenario then. Cheers, - pyr On Wed, Dec 21, 2011 at 10:10 PM, aaron morton wrote: > How often is "relatively often" ? > > * having a fixed amount of nodes with initial tokens and letting new > ones auto bootstrap themselves > > Generally a bad idea, you should make sure nodes re given sensible tokens > that evenly distribute the data. >
Re: Routine nodetool repair
Output from nodetool ring: Address DC RackStatus State Load OwnsToken 85070591730234615865843651857942052864 110.82.155.2 datacenter1 rack1 Up Normal 78.23 MB 50.00% 0 110.82.155.4 datacenter1 rack1 Up Normal 67.21 MB 50.00% 85070591730234615865843651857942052864 On Wed, Dec 21, 2011 at 1:18 PM, aaron morton wrote: > Post the output from nodetool ring and take a look at > http://wiki.apache.org/cassandra/Operations#Token_selection > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 22/12/2011, at 5:21 AM, Blake Starkenburg wrote: > > Thank You! > > Could the lack of routine repair be why nodetool ring reports: node(1) > Load -> 78.24 MB and node(2) Load -> 67.21 MB? The load span between the > two nodes has been increasing ever so slowly... > > On Wed, Dec 21, 2011 at 1:00 AM, aaron morton wrote: > >> Here you go >> http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds >> >> Cheers >> >> - >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 21/12/2011, at 2:44 PM, Blake Starkenburg wrote: >> >> I have been playing around with Cassandra for a few months now. Starting >> to explore more of the routine maintenance and backup strategies and I have >> a general question about nodetool repair. After reading the following page: >> http://www.datastax.com/docs/0.8/operations/cluster_management it has >> occurred to me that for these past few months I have NOT DONE any cleanup >> or repair commands on a test 2-node cluster (and their has been quite a few >> deletes, writes, etc.). >> >> For some reason I was under the assumption that Cassandra handled the >> tombstone records from deletes automatically? Should I still run nodetool >> repair and if so, what about old deletes which occurred months ago? >> >> Thank You! >> >> >> > >
Monitoring move progress
I've got some nodes in a "moving" state in a cluster (the nodes to which they stream shouldn't overlap), and I'm finding it difficult to determine if they're actually doing anything related to the move at this point, or if they're stuck in the state and not actually doing anything. In each case, I issued the move command per usual. The log shows information about the move when it begins, showing the correct token change that I would expect in each case. Compactions took place on each moving node, which can be viewed through "nodetool compactionstats" or through the CompactionManager in JMX. But eventually the compactions stopped, apart from various ongoing secondary index rebuilds and consequent related index compactions. Yet I see no stream transfers via netstats. My expectation is that after the compactions (which the project wiki refers to as "anti-compactions"), I would start to see outbound streaming activity in netstats. Yet I do not. I don't see any errors listed in the logs on the moving servers since the moves began. Using cassandra 1.0.5. ByteOrderedPartitioner. Any suggestions on how to determine what's going on? Thanks in advance. - Ethan
Re: Routine nodetool repair
> Could the lack of routine repair be why nodetool ring reports: node(1) Load > -> 78.24 MB and node(2) Load -> 67.21 MB? The load span between the two > nodes has been increasing ever so slowly... No. Generally there will be a variation in load depending on what state compaction happens to be in on the given node (I am assuming you're not using leveled compaction). That is in addition to any imbalance that might result from your population of data in the cluster. Running repair can affect the live size, but *lack* of repair won't cause a live size divergence. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)