date:20111221

Re: Routine nodetool repair

2011-12-21 Thread aaron morton

Here you go 
http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/12/2011, at 2:44 PM, Blake Starkenburg wrote:

> I have been playing around with Cassandra for a few months now. Starting to 
> explore more of the routine maintenance and backup strategies and I have a 
> general question about nodetool repair. After reading the following page: 
> http://www.datastax.com/docs/0.8/operations/cluster_management it has 
> occurred to me that for these past few months I have NOT DONE any cleanup or 
> repair commands on a test 2-node cluster (and their has been quite a few 
> deletes, writes, etc.).
> 
> For some reason I was under the assumption that Cassandra handled the 
> tombstone records from deletes automatically? Should I still run nodetool 
> repair and if so, what about old deletes which occurred months ago?
> 
> Thank You!

Re: Can I slice on composite indexes?

2011-12-21 Thread aaron morton

You can slice the "key1" row to get the columns that have "xyz" as the value 
for the first component in the column name.  Check the docs in your client for 
how to do that. 

Hope that helps. 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/12/2011, at 3:04 PM, Maxim Potekhin wrote:

> Let's say I have rows with composite columns Like
> 
> ("key1", {('xyz', 'abc'): 'colval1'},  {('xyz', 'def'): 'colval2'})
> ("key2", {('ble', 'meh'): 'otherval'})
> 
> Is it possible to create a composite type index such that I can query on 'xyz'
> and get the first two columns?
> 
> Thanks
> 
> Maxim
>

Re: Doubts related to composite type column names/values

2011-12-21 Thread Sylvain Lebresne

On Tue, Dec 20, 2011 at 9:33 PM, Maxim Potekhin  wrote:
> Thank you Aaron! As long as I have plain strings, would you say that I would
> do almost as well with catenation?

Not without a concatenation aware comparator. The padding aaron is talking of
is not a mixed type problem only. What I mean here is that if you use a simple
string comparator (UTF8Type, AsciiType or even BytesType), then you will have
the following sorting:
"foo24:bar"
"foo:bar"
"foobar:bar"
because ':' is between '2' and 'b' in ascii, you could use another separator but
you get the point. In other words, concatenating strings doesn't make the
comparator aware of that fact.
CompositeType on the other hand sorts each component separately, so it will
sort:
"foo"  : "bar"
"foo24"  : "bar"
"foobar" : "bar"
which is usually what you want.

--
Sylvain

>
> Of course I realize that mixed types are a very different case where the
> composite is very useful.
>
> Thanks
>
> Maxim
>
>
>
> On 12/20/2011 2:44 PM, aaron morton wrote:
>
> Component values are compared in a type aware fashion, an Integer is an
> Integer. Not a 10 character zero padded string.
>
> You can also slice on the components. Just like with string concat, but
> nicer.  . e.g. If you app is storing comments for a thing, and the column
> names have the form  or   you can slice
> for all properties of a comment or all properties for comments between two
> comment_id's
>
> Finally, the client library knows what's going on.
>
> Hope that helps.
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 21/12/2011, at 7:43 AM, Maxim Potekhin wrote:
>
> With regards to static, what are major benefits as it compares with
> string catenation (with some convenient separator inserted)?
>
> Thanks
>
> Maxim
>
>
> On 12/20/2011 1:39 PM, Richard Low wrote:
>
> On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lew  wrote:
>
> With regard to the composite columns stuff in Cassandra, I have the
>
> following doubts :
>
>
> 1. What is the storage overhead of the composite type column names/values,
>
> The values are the same.  For each dimension, there is 3 bytes overhead.
>
>
> 2. what exactly is the difference between the DynamicComposite and Static
>
> Composite ?
>
> Static composite type has the types of each dimension specified in the
>
> column family definition, so all names within that column family have
>
> the same type.  Dynamic composite type lets you specify the type for
>
> each column, so they can be different.  There is extra storage
>
> overhead for this and care must be taken to ensure all column names
>
> remain comparable.
>
>
>
>
>

Re: Composite Column Question

2011-12-21 Thread Sylvain Lebresne

On Wed, Dec 21, 2011 at 7:04 AM, Martin Arrowsmith
 wrote:
> Dear Cassandra Experts,
>
> Are the number of composite attributes fixed for each column family ?
>
> I have been doing : "create column family MyCF with comparator =
> 'CompositeType(IntegerType, UTF8Type)'
>
> And this creates a composite { integer:string }
>
> Hector complains when I give a 3rd attribute.

If you use CompositeType(IntegerType, UTF8Type) as comparator, you
cannot add a column with a name that has 3 components, because the
comparator wouldn't know how to compare that 3rd component.

However, the comparator is fine with you adding columns whose
name doesn't have all of the specified components. In other words,
if you know in advance that you may need up to 3 attributes (and you
know their type), you can declare all them at first but only use
the first components on some column name if needed.

Now that's obviously a bit restrictive. In theory, it could be possible
to update the comparator definition from
  CompositeType(IntegerType, UTF8Type)
to say
  CompositeType(IntegerType, UTF8Type, UUIDType)
at a later time but it is not possible right now because C* doesn't
allow any change of comparator. We should (and may) allow such
valid change at some point in the future.

Now, there also exists a DynamicCompositeType, that don't put
any restrictions on the number and type of component it uses. The
reason I'm mentioning it only at the end however is that it is trickier
to use and have some overhead over CompositeType. I'm also
not sure how much support hector have for this comparator. I would
avoid using DynamicCompositeType unless you know that
CompositeType really doesn't work for you.

--
Sylvain

>
> If "unstatic" composite columns are possible, what would be the CLI command
> to create such a column family, and how can it be implemented ?
>
> Best wishes,
>
> Martin
>

Handling topology changes

2011-12-21 Thread pyr

Hi,

I wonder about the best strategy when a cluster sees changes in its
topology relatively often.

My main concern is how to handle the initial token for new nodes. If a
cluster is first created with 7 nodes for which the initial token is
calculated with the formula here:
http://wiki.apache.org/cassandra/Operations#Token_selection.

It seems as though two strategies can be applied:

* having a fixed amount of nodes with initial tokens and letting new
  ones auto bootstrap themselves
* recomputing tokens for the new number of nodes and using nodetool move
  for each of those

Is any of these right and are there other strategies ?

   - pyr

Re: Doubts related to composite type column names/values

2011-12-21 Thread R. Verlangen

Is it true that you can also just get the same results as when you pick a
UTF8 key with this content:
keyA:keyB

Of should you really use the composite keys? If so, what is the big
advantage of composite over combined utf-8 keys?

Robin

2011/12/21 Sylvain Lebresne 

> On Tue, Dec 20, 2011 at 9:33 PM, Maxim Potekhin  wrote:
> > Thank you Aaron! As long as I have plain strings, would you say that I
> would
> > do almost as well with catenation?
>
> Not without a concatenation aware comparator. The padding aaron is talking
> of
> is not a mixed type problem only. What I mean here is that if you use a
> simple
> string comparator (UTF8Type, AsciiType or even BytesType), then you will
> have
> the following sorting:
> "foo24:bar"
> "foo:bar"
> "foobar:bar"
> because ':' is between '2' and 'b' in ascii, you could use another
> separator but
> you get the point. In other words, concatenating strings doesn't make the
> comparator aware of that fact.
> CompositeType on the other hand sorts each component separately, so it will
> sort:
> "foo"  : "bar"
> "foo24"  : "bar"
> "foobar" : "bar"
> which is usually what you want.
>
> --
> Sylvain
>
> >
> > Of course I realize that mixed types are a very different case where the
> > composite is very useful.
> >
> > Thanks
> >
> > Maxim
> >
> >
> >
> > On 12/20/2011 2:44 PM, aaron morton wrote:
> >
> > Component values are compared in a type aware fashion, an Integer is an
> > Integer. Not a 10 character zero padded string.
> >
> > You can also slice on the components. Just like with string concat, but
> > nicer.  . e.g. If you app is storing comments for a thing, and the column
> > names have the form  or   you can
> slice
> > for all properties of a comment or all properties for comments between
> two
> > comment_id's
> >
> > Finally, the client library knows what's going on.
> >
> > Hope that helps.
> >
> > -
> > Aaron Morton
> > Freelance Developer
> > @aaronmorton
> > http://www.thelastpickle.com
> >
> > On 21/12/2011, at 7:43 AM, Maxim Potekhin wrote:
> >
> > With regards to static, what are major benefits as it compares with
> > string catenation (with some convenient separator inserted)?
> >
> > Thanks
> >
> > Maxim
> >
> >
> > On 12/20/2011 1:39 PM, Richard Low wrote:
> >
> > On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lew  wrote:
> >
> > With regard to the composite columns stuff in Cassandra, I have the
> >
> > following doubts :
> >
> >
> > 1. What is the storage overhead of the composite type column
> names/values,
> >
> > The values are the same.  For each dimension, there is 3 bytes overhead.
> >
> >
> > 2. what exactly is the difference between the DynamicComposite and Static
> >
> > Composite ?
> >
> > Static composite type has the types of each dimension specified in the
> >
> > column family definition, so all names within that column family have
> >
> > the same type.  Dynamic composite type lets you specify the type for
> >
> > each column, so they can be different.  There is extra storage
> >
> > overhead for this and care must be taken to ensure all column names
> >
> > remain comparable.
> >
> >
> >
> >
> >
>

Creating column families per client

2011-12-21 Thread Rafael Almeida

Hello,

I am evaluating the usage of cassandra for my system. I will have several 
clients who won't share data with each other. My idea is to create one column 
family per client. When a new client comes in and adds data to the system, I'd 
like to create a column family dynamically. Is that reliable? Can I create a 
column family on a node and imediately add new data on that column family and 
be confident that the data added will eventually become visible to a read?

[]'s
Rafael

Re: Creating column families per client

2011-12-21 Thread Alain RODRIGUEZ

Hi, I don't know if this will be technically possible, but I just want to
warn you about creating a lot of column families. When you will have a lot
clients, you will have a lot of column families and, if I'm right, each
column family uses memory on every node. You will run out of memory very
fast, because of the non horizontal growth of the memory usage per column
family (adding nodes won't resolve the problem, you will have to upgrade
hardware on existing nodes, which is not Cassandra's philosophy).

I think you should use rows or columns inside your column families as much
as possible unless you have really a few clients.

Once again, I'm not a Cassandra expert. I might be wrong :-).

Alain

2011/12/21 Rafael Almeida 

> Hello,
>
> I am evaluating the usage of cassandra for my system. I will have several
> clients who won't share data with each other. My idea is to create one
> column family per client. When a new client comes in and adds data to the
> system, I'd like to create a column family dynamically. Is that reliable?
> Can I create a column family on a node and imediately add new data on that
> column family and be confident that the data added will eventually become
> visible to a read?
>
> []'s
> Rafael
>
>
>

Re: Creating column families per client

2011-12-21 Thread Philippe

Every node?  I hadn't realized that. Is there a place where i can compute
how much memory is being 'wasted' ?
Le 21 déc. 2011 15:09, "Alain RODRIGUEZ"  a écrit :

> Hi, I don't know if this will be technically possible, but I just want to
> warn you about creating a lot of column families. When you will have a lot
> clients, you will have a lot of column families and, if I'm right, each
> column family uses memory on every node. You will run out of memory very
> fast, because of the non horizontal growth of the memory usage per column
> family (adding nodes won't resolve the problem, you will have to upgrade
> hardware on existing nodes, which is not Cassandra's philosophy).
>
> I think you should use rows or columns inside your column families as much
> as possible unless you have really a few clients.
>
> Once again, I'm not a Cassandra expert. I might be wrong :-).
>
> Alain
>
> 2011/12/21 Rafael Almeida 
>
>> Hello,
>>
>> I am evaluating the usage of cassandra for my system. I will have several
>> clients who won't share data with each other. My idea is to create one
>> column family per client. When a new client comes in and adds data to the
>> system, I'd like to create a column family dynamically. Is that reliable?
>> Can I create a column family on a node and imediately add new data on that
>> column family and be confident that the data added will eventually become
>> visible to a read?
>>
>> []'s
>> Rafael
>>
>>
>>
>

Re: Creating column families per client

2011-12-21 Thread Flavio Baronti


Hi,

based on my experience with Cassandra 0.7.4, i strongly discourage you to do that: we tried dynamical creation of column 
families, and it was a nightmare.
First of all, the operation can not be done concurrently, therefore you must find a way to avoid parallel creation (over 
all the cluster, not in a single node).
The main problem however is with timestamps. The structure of your keyspace is versioned with a time-dependent id, which 
is assigned by the host where you perform the schema update based on the local machine time. If you do two updates in 
close succession on two different nodes, and their clocks are not perfectly synchronized (and they will never be), 
Cassandra might be confused by their relative ordering, and stop working altogether.


Bottom line: don't.

Flavio

Il 12/21/2011 14:45 PM, Rafael Almeida ha scritto:

Hello,

I am evaluating the usage of cassandra for my system. I will have several 
clients who won't share data with each other. My idea is to create one column 
family per client. When a new client comes in and adds data to the system, I'd 
like to create a column family dynamically. Is that reliable? Can I create a 
column family on a node and imediately add new data on that column family and 
be confident that the data added will eventually become visible to a read?

[]'s
Rafael

Re: Routine nodetool repair

2011-12-21 Thread Blake Starkenburg

Thank You!

Could the lack of routine repair be why nodetool ring reports: node(1) Load
-> 78.24 MB and node(2) Load -> 67.21 MB? The load span between the two
nodes has been increasing ever so slowly...

On Wed, Dec 21, 2011 at 1:00 AM, aaron morton wrote:

> Here you go
> http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 21/12/2011, at 2:44 PM, Blake Starkenburg wrote:
>
> I have been playing around with Cassandra for a few months now. Starting
> to explore more of the routine maintenance and backup strategies and I have
> a general question about nodetool repair. After reading the following page:
> http://www.datastax.com/docs/0.8/operations/cluster_management it has
> occurred to me that for these past few months I have NOT DONE any cleanup
> or repair commands on a test 2-node cluster (and their has been quite a few
> deletes, writes, etc.).
>
> For some reason I was under the assumption that Cassandra handled the
> tombstone records from deletes automatically? Should I still run nodetool
> repair and if so, what about old deletes which occurred months ago?
>
> Thank You!
>
>
>

Re: questions on datastax opscenter

2011-12-21 Thread Jonathan Ellis

On Tue, Dec 20, 2011 at 1:49 PM, Feng Qu  wrote:
> I have two questions for community version of opscenter
>
> 1) does it work with multiple cassandra cluster?

Tyler answered the second part. As for managing multiple clusters from
a single opscenter: not yet, but it's on our radar.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Creating column families per client

2011-12-21 Thread Ryan Lowe

What we have done to avoid creating multiple column families is to sort of
namespace the row key.  So if we have a column family of Users and
accounts: "AccountA" and "AccountB", we do the following:

Column Family User:
   "AccountA/ryan" : { first: Ryan, last: Lowe }
   "AccountB/ryan" : { first: Ryan, last: Smith}

etc.

For our needs, this did the same thing as having 2 "User" column families
for "AccountA" and "AccountB"

Ryan

On Wed, Dec 21, 2011 at 10:34 AM, Flavio Baronti
wrote:

> Hi,
>
> based on my experience with Cassandra 0.7.4, i strongly discourage you to
> do that: we tried dynamical creation of column families, and it was a
> nightmare.
> First of all, the operation can not be done concurrently, therefore you
> must find a way to avoid parallel creation (over all the cluster, not in a
> single node).
> The main problem however is with timestamps. The structure of your
> keyspace is versioned with a time-dependent id, which is assigned by the
> host where you perform the schema update based on the local machine time.
> If you do two updates in close succession on two different nodes, and their
> clocks are not perfectly synchronized (and they will never be), Cassandra
> might be confused by their relative ordering, and stop working altogether.
>
> Bottom line: don't.
>
> Flavio
>
> Il 12/21/2011 14:45 PM, Rafael Almeida ha scritto:
>
>  Hello,
>>
>> I am evaluating the usage of cassandra for my system. I will have several
>> clients who won't share data with each other. My idea is to create one
>> column family per client. When a new client comes in and adds data to the
>> system, I'd like to create a column family dynamically. Is that reliable?
>> Can I create a column family on a node and imediately add new data on that
>> column family and be confident that the data added will eventually become
>> visible to a read?
>>
>> []'s
>> Rafael
>>
>>
>>
>>
>

Re: Creating column families per client

2011-12-21 Thread Nick Bailey

The overhead for column families was greatly reduced in 0.8 and 1.0.
It should now be possible to have hundreds or thousands of column
families. The setting 'memtable_total_space_in_mb' was introduced that
allows for a global memtable threshold, and cassandra will handle
flushing on its own.

See 
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management

Another thing you should consider is the lack of built in access
controls. There is an authentication/authorization interface you can
plug in to and examples in the examples/ directory of the source
download.

On Wed, Dec 21, 2011 at 10:36 AM, Ryan Lowe  wrote:
> What we have done to avoid creating multiple column families is to sort of
> namespace the row key.  So if we have a column family of Users and accounts:
> "AccountA" and "AccountB", we do the following:
>
> Column Family User:
>    "AccountA/ryan" : { first: Ryan, last: Lowe }
>    "AccountB/ryan" : { first: Ryan, last: Smith}
>
> etc.
>
> For our needs, this did the same thing as having 2 "User" column families
> for "AccountA" and "AccountB"
>
> Ryan
>
>
> On Wed, Dec 21, 2011 at 10:34 AM, Flavio Baronti 
> wrote:
>>
>> Hi,
>>
>> based on my experience with Cassandra 0.7.4, i strongly discourage you to
>> do that: we tried dynamical creation of column families, and it was a
>> nightmare.
>> First of all, the operation can not be done concurrently, therefore you
>> must find a way to avoid parallel creation (over all the cluster, not in a
>> single node).
>> The main problem however is with timestamps. The structure of your
>> keyspace is versioned with a time-dependent id, which is assigned by the
>> host where you perform the schema update based on the local machine time. If
>> you do two updates in close succession on two different nodes, and their
>> clocks are not perfectly synchronized (and they will never be), Cassandra
>> might be confused by their relative ordering, and stop working altogether.
>>
>> Bottom line: don't.
>>
>> Flavio
>>
>> Il 12/21/2011 14:45 PM, Rafael Almeida ha scritto:
>>
>>> Hello,
>>>
>>> I am evaluating the usage of cassandra for my system. I will have several
>>> clients who won't share data with each other. My idea is to create one
>>> column family per client. When a new client comes in and adds data to the
>>> system, I'd like to create a column family dynamically. Is that reliable?
>>> Can I create a column family on a node and imediately add new data on that
>>> column family and be confident that the data added will eventually become
>>> visible to a read?
>>>
>>> []'s
>>> Rafael
>>>
>>>
>>>
>>
>

Re: Counter read requests spread across replicas ?

2011-12-21 Thread aaron morton

How many rows are you asking for in the multget_slice and what thread pools are 
showing pending tasks ?

Also, what happens when you reduce the number of rows in the request?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/12/2011, at 11:57 AM, Philippe wrote:

> Hello,
> 5 nodes running 0.8.7/0.8.9, RF=3, BOP, counter columns inside super columns. 
> Read queries are multigetslices of super columns inside of which I read every 
> column for processing (20-30 at most), using Hector with default settings.
> Watching tpstat on the 3 nodes holding the data being most often queries, I 
> see the pending count increase only on the "main replica" and I see heavy CPU 
> load and network load only on that node. The other nodes seem to be doing 
> very little.
> 
> Aren't counter read requests supposed to be round-robin across replicas ? I'm 
> confused as to why the nodes don't exhibit the same load.
> 
> Thanks

Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys

2011-12-21 Thread aaron morton

AFAIK there are no plans kill the BOP, but I would still try to make your life 
easier by using the RP. . 

My understanding of the problem is at certain times you snapshot the files in a 
dir; and the main query you want to handle is "At what points between time t0 
and time t1 did files x,y and z exist?".

You could consider:

1) Partitioning the time series data in across each row, then make the row key 
is the timestamp for the start of the partition. If you have rollup partitions 
consider making the row key  , e.g. 
<123456789."1d"> for a 1 day partition that starts at 123456789
2) In each row use column names that have the form  
where time stamp is the time of the snapshot. 

To query between two times (t0 and t1):

1) Determine which partitions the time span covers, this will give you a list 
of rows. 
2) Execute a multi-get slice for the all rows using   and  (I'm 
using * here as a null, check with your client to see how to use composite 
columns.)

Hope that helps. 
Aaron


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/12/2011, at 9:03 AM, Bryce Allen wrote:

> I wasn't aware of CompositeColumns, thanks for the tip. However I think
> it still doesn't allow me to do the query I need - basically I need to
> do a timestamp range query, limiting only to certain file names at
> each timestamp. With BOP and a separate row for each timestamp,
> prefixed by a random UUID, and file names as column names, I can do this
> query. With CompositeColumns, I can only query one contiguous range, so
> I'd have to know the timestamps before hand to limit the file names. I
> can resolve this using indexes, but on paper it looks like this would be
> significantly slower (it would take me 5 round trips instead of 3 to
> complete each query, and the query is made multiple times on every
> single client request).
> 
> The two down sides I've seen listed for BOP are balancing issues and
> hotspots. I can understand why RP is recommended, from the balancing
> issues alone. However these aren't problems for my application. Is
> there anything else I am missing? Does the Cassandra team plan on
> continuing to support BOP? I haven't completely ruled out RP, but I
> like having BOP as an option, it opens up interesting modeling
> alternatives that I think have real advantages for some
> (if uncommon) applications.
> 
> Thanks,
> Bryce
> 
> On Wed, 21 Dec 2011 08:08:16 +1300
> aaron morton  wrote:
>> Bryce, 
>>  Have you considered using CompositeColumns and a standard CF?
>> Row key is the UUID column name is (timestamp : dir_entry) you can
>> then slice all columns with a particular time stamp. 
>> 
>>  Even if you have a random key, I would use the RP unless you
>> have an extreme use case. 
>> 
>> Cheers
>> 
>> -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 21/12/2011, at 3:06 AM, Bryce Allen wrote:
>> 
>>> I think it comes down to how much you benefit from row range scans,
>>> and how confident you are that going forward all data will continue
>>> to use random row keys.
>>> 
>>> I'm considering using BOP as a way of working around the non indexes
>>> super column limitation. In my current schema, row keys are random
>>> UUIDs, super column names are timestamps, and columns contain a
>>> snapshot in time of directory contents, and could be quite large. If
>>> instead I use row keys that are (uuid)-(timestamp), and use a
>>> standard column family, I can do a row range query and select only
>>> specific columns. I'm still evaluating if I can do this with BOP -
>>> ideally the token would just use the first 128 bits of the key, and
>>> I haven't found any documentation on how it compares keys of
>>> different length.
>>> 
>>> Another trick with BOP is to use MD5(rowkey)-rowkey for data that
>>> has non uniform row keys. I think it's reasonable to use if most
>>> data is uniform and benefits from range scans, but a few things are
>>> added that aren't/don't. This trick does make the keys larger,
>>> which increases storage cost and IO load, so it's probably a bad
>>> idea if a significant subset of the data requires it.
>>> 
>>> Disclaimer - I wrote that wiki article to fill in a documentation
>>> gap, since there were no examples of BOP and I wasted a lot of time
>>> before I noticed the hex byte array vs decimal distinction for
>>> specifying the initial tokens (which to be fair is documented, just
>>> easy to miss on a skim). I'm also new to cassandra, I'm just
>>> describing what makes sense to me "on paper". FWIW I confirmed that
>>> random UUIDs (type 4) row keys really do evenly distribute when
>>> using BOP.
>>> 
>>> -Bryce
>>> 
>>> On Mon, 19 Dec 2011 19:01:00 -0800
>>> Drew Kutcharian  wrote:
 Hey Guys,
 
 I just came across
 http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got
 me thinking. If the row keys are java.util.UUID which are generated
 r

Re: Handling topology changes

2011-12-21 Thread aaron morton

How often is "relatively often" ? 

> * having a fixed amount of nodes with initial tokens and letting new
>  ones auto bootstrap themselves
Generally a bad idea, you should make sure nodes re given sensible tokens that 
evenly distribute the data. 

Cheers

 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/12/2011, at 12:34 AM, p...@smallrivers.com wrote:

> Hi,
> 
> I wonder about the best strategy when a cluster sees changes in its
> topology relatively often.
> 
> My main concern is how to handle the initial token for new nodes. If a
> cluster is first created with 7 nodes for which the initial token is
> calculated with the formula here:
> http://wiki.apache.org/cassandra/Operations#Token_selection.
> 
> It seems as though two strategies can be applied:
> 
> * having a fixed amount of nodes with initial tokens and letting new
>  ones auto bootstrap themselves
> * recomputing tokens for the new number of nodes and using nodetool move
>  for each of those
> 
> Is any of these right and are there other strategies ?
> 
>   - pyr

Re: Doubts related to composite type column names/values

2011-12-21 Thread aaron morton

Keys are sorted by their token, when using the RandomPartitioner this is a MD5 
hash. So they are essentially randomly sorted. 

I would use CompositeTypes as keys if they make sense for your app. e.g.  you 
are storing time series data and the row key is the time stamp and the length 
of the time span. In this case you have a stable known format of .  
The advantage here is the same as any time you introduce type awareness into a 
system, somewhere some code notice if you try to store a key of the wrong form. 

If you have keys that have a variable number of elements, such as a path 
hierarchy it would not make sense to model that as a CompositeType (IMHO).

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/12/2011, at 1:26 AM, R. Verlangen wrote:

> Is it true that you can also just get the same results as when you pick a 
> UTF8 key with this content:
> keyA:keyB
> 
> Of should you really use the composite keys? If so, what is the big advantage 
> of composite over combined utf-8 keys?
> 
> Robin
> 
> 2011/12/21 Sylvain Lebresne 
> On Tue, Dec 20, 2011 at 9:33 PM, Maxim Potekhin  wrote:
> > Thank you Aaron! As long as I have plain strings, would you say that I would
> > do almost as well with catenation?
> 
> Not without a concatenation aware comparator. The padding aaron is talking of
> is not a mixed type problem only. What I mean here is that if you use a simple
> string comparator (UTF8Type, AsciiType or even BytesType), then you will have
> the following sorting:
> "foo24:bar"
> "foo:bar"
> "foobar:bar"
> because ':' is between '2' and 'b' in ascii, you could use another separator 
> but
> you get the point. In other words, concatenating strings doesn't make the
> comparator aware of that fact.
> CompositeType on the other hand sorts each component separately, so it will
> sort:
> "foo"  : "bar"
> "foo24"  : "bar"
> "foobar" : "bar"
> which is usually what you want.
> 
> --
> Sylvain
> 
> >
> > Of course I realize that mixed types are a very different case where the
> > composite is very useful.
> >
> > Thanks
> >
> > Maxim
> >
> >
> >
> > On 12/20/2011 2:44 PM, aaron morton wrote:
> >
> > Component values are compared in a type aware fashion, an Integer is an
> > Integer. Not a 10 character zero padded string.
> >
> > You can also slice on the components. Just like with string concat, but
> > nicer.  . e.g. If you app is storing comments for a thing, and the column
> > names have the form  or   you can slice
> > for all properties of a comment or all properties for comments between two
> > comment_id's
> >
> > Finally, the client library knows what's going on.
> >
> > Hope that helps.
> >
> > -
> > Aaron Morton
> > Freelance Developer
> > @aaronmorton
> > http://www.thelastpickle.com
> >
> > On 21/12/2011, at 7:43 AM, Maxim Potekhin wrote:
> >
> > With regards to static, what are major benefits as it compares with
> > string catenation (with some convenient separator inserted)?
> >
> > Thanks
> >
> > Maxim
> >
> >
> > On 12/20/2011 1:39 PM, Richard Low wrote:
> >
> > On Tue, Dec 20, 2011 at 5:28 PM, Ertio Lew  wrote:
> >
> > With regard to the composite columns stuff in Cassandra, I have the
> >
> > following doubts :
> >
> >
> > 1. What is the storage overhead of the composite type column names/values,
> >
> > The values are the same.  For each dimension, there is 3 bytes overhead.
> >
> >
> > 2. what exactly is the difference between the DynamicComposite and Static
> >
> > Composite ?
> >
> > Static composite type has the types of each dimension specified in the
> >
> > column family definition, so all names within that column family have
> >
> > the same type.  Dynamic composite type lets you specify the type for
> >
> > each column, so they can be different.  There is extra storage
> >
> > overhead for this and care must be taken to ensure all column names
> >
> > remain comparable.
> >
> >
> >
> >
> >
>

Re: Routine nodetool repair

2011-12-21 Thread aaron morton

Post the output from nodetool ring and take a look at 
http://wiki.apache.org/cassandra/Operations#Token_selection

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/12/2011, at 5:21 AM, Blake Starkenburg wrote:

> Thank You! 
> 
> Could the lack of routine repair be why nodetool ring reports: node(1) Load 
> -> 78.24 MB and node(2) Load -> 67.21 MB? The load span between the two nodes 
> has been increasing ever so slowly...
> 
> On Wed, Dec 21, 2011 at 1:00 AM, aaron morton  wrote:
> Here you go 
> http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 21/12/2011, at 2:44 PM, Blake Starkenburg wrote:
> 
>> I have been playing around with Cassandra for a few months now. Starting to 
>> explore more of the routine maintenance and backup strategies and I have a 
>> general question about nodetool repair. After reading the following page: 
>> http://www.datastax.com/docs/0.8/operations/cluster_management it has 
>> occurred to me that for these past few months I have NOT DONE any cleanup or 
>> repair commands on a test 2-node cluster (and their has been quite a few 
>> deletes, writes, etc.).
>> 
>> For some reason I was under the assumption that Cassandra handled the 
>> tombstone records from deletes automatically? Should I still run nodetool 
>> repair and if so, what about old deletes which occurred months ago?
>> 
>> Thank You!
> 
>

Re: Counter read requests spread across replicas ?

2011-12-21 Thread Philippe

Hi Aaron,

>How many rows are you asking for in the multget_slice and what thread
pools are showing pending tasks ?
I am querying in batches of 256 keys max. Each batch may slice between 1
and 5 explicit super columns (I need all the columns in each super column,
there are at the very most a couple dozen columns per SC).

On the first replica, only ReadStage ever shows any pending. All the others
 have 1 to 10 pending from time to time only. Here's a typical "high
pending count" reading on the first replica for the data hotspot.
ReadStage13  523810374301128 0
0
I've got a watch running every two seconds and I see the numbers vary every
time going from that high point to 0 active, 0 pending. The one thing I've
noticed is that I hardly every see the Active count stay up at the current
2s sampling rate.
On the 2 other replicas, I hardly ever see any pendings on ReadStage and
Active hardly goes up to 1 or 2. But I do see a little PENDING
on RequestResponseStage, goes up in the tens or hundreds from time to time.

If I'm flooding that one replica, shouldn't the ReadStage Active count be
at maximum capacity ?

I've already thought of CASSANDRA-2980 but I'm running 0.8.7 and 0.8.9.

Also, what happens when you reduce the number of rows in the request?
>
I've reduced the requests to batches of 16. I've had to increased the
number of threads from 30 to 90 in order to get the same key throughput
because the throughput I measure drastically goes down on a per thread
basis.
What I see :
 - CPU utilization is lower on the first replica (why would that be if the
batches are smaller ?)
 - Pending ReadStage on first replica seems to be staying higher longer.
Still goes down to 0 regularly.
 - lowering to 60 client threads, I see non-zero active MutationStage and
ReplicateOnWriteStage more often
For our use-case, the higher the throughput per client thread, the less
rework will be done in our processing.

Another experiment : I stopped the process that does all the reading and a
little of the writing. All that's left is a single-threaded process that
sending counter updates as fast as it can in batches of up to 50 mutations.
First replica : pending counts go up into the low hundreds and back to 0,
active up to 3 or 5 and that's a max. Some mutation stage active & pendings
=> the process is indeed faster at updating the counters so that doesn't
surprise me given that a counter write requires a read.
Second & third replicas : no read stage pendings at all. A
little RequestResponseStage as earlier.

Cheers
Philippe

>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 21/12/2011, at 11:57 AM, Philippe wrote:
>
> Hello,
> 5 nodes running 0.8.7/0.8.9, RF=3, BOP, counter columns inside super
> columns. Read queries are multigetslices of super columns inside of which I
> read every column for processing (20-30 at most), using Hector with default
> settings.
> Watching tpstat on the 3 nodes holding the data being most often queries,
> I see the pending count increase only on the "main replica" and I see heavy
> CPU load and network load only on that node. The other nodes seem to be
> doing very little.
>
> Aren't counter read requests supposed to be round-robin across replicas ?
> I'm confused as to why the nodes don't exhibit the same load.
>
> Thanks
>
>
>

Re: Counter read requests spread across replicas ?

2011-12-21 Thread Philippe

along the same line of the last experimient I did (cluster is only being
updated by a single threaded batching processing.)
All nodes are the same hardware & configuration. Why on earth would one
node require disk IO and not the 2 replicas ?

Primary replica show some disk activity (iostat shows about 40%)
total-cpu-usage -dsk/total-
usr sys idl wai hiq siq| read  writ
67  10  19   2   0   3|4244k  364k|

where as 2nd & 3rd replica do not
total-cpu-usage -dsk/total-
usr sys idl wai hiq siq| read  writ
42  13  41   0   0   3|   0 0 |
 47  15  34   0   0   4|4096B  185k
 49  14  35   0   0   3|   0  8192B
 47  16  33   0   0   4|   0  4096B
 44  13  41   0   0   3| 284k  112k

3rd
11   2  87   1   0   0|   0   136k|
  0   0  99   0   0   0|   0 0
  9   1  90   0   0   0|4096B  128k
  2   2  96   0   0   0|   0 0
  0   0  99   0   0   0|   0 0
 11   1  87   0   0   0|   0   128k


Philippe
2011/12/21 Philippe 

> Hi Aaron,
>
> >How many rows are you asking for in the multget_slice and what thread
> pools are showing pending tasks ?
> I am querying in batches of 256 keys max. Each batch may slice between 1
> and 5 explicit super columns (I need all the columns in each super column,
> there are at the very most a couple dozen columns per SC).
>
> On the first replica, only ReadStage ever shows any pending. All the
> others  have 1 to 10 pending from time to time only. Here's a typical "high
> pending count" reading on the first replica for the data hotspot.
> ReadStage13  523810374301128 0
> 0
> I've got a watch running every two seconds and I see the numbers vary
> every time going from that high point to 0 active, 0 pending. The one thing
> I've noticed is that I hardly every see the Active count stay up at the
> current 2s sampling rate.
> On the 2 other replicas, I hardly ever see any pendings on ReadStage and
> Active hardly goes up to 1 or 2. But I do see a little PENDING
> on RequestResponseStage, goes up in the tens or hundreds from time to time.
>
>
> If I'm flooding that one replica, shouldn't the ReadStage Active count be
> at maximum capacity ?
>
>
> I've already thought of CASSANDRA-2980 but I'm running 0.8.7 and 0.8.9.
>
> Also, what happens when you reduce the number of rows in the request?
>>
> I've reduced the requests to batches of 16. I've had to increased the
> number of threads from 30 to 90 in order to get the same key throughput
> because the throughput I measure drastically goes down on a per thread
> basis.
> What I see :
>  - CPU utilization is lower on the first replica (why would that be if the
> batches are smaller ?)
>  - Pending ReadStage on first replica seems to be staying higher longer.
> Still goes down to 0 regularly.
>  - lowering to 60 client threads, I see non-zero active MutationStage and
> ReplicateOnWriteStage more often
> For our use-case, the higher the throughput per client thread, the less
> rework will be done in our processing.
>
> Another experiment : I stopped the process that does all the reading and a
> little of the writing. All that's left is a single-threaded process that
> sending counter updates as fast as it can in batches of up to 50 mutations.
> First replica : pending counts go up into the low hundreds and back to 0,
> active up to 3 or 5 and that's a max. Some mutation stage active & pendings
> => the process is indeed faster at updating the counters so that doesn't
> surprise me given that a counter write requires a read.
> Second & third replicas : no read stage pendings at all. A
> little RequestResponseStage as earlier.
>
> Cheers
> Philippe
>
>>
>> Cheers
>>
>>   -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 21/12/2011, at 11:57 AM, Philippe wrote:
>>
>> Hello,
>> 5 nodes running 0.8.7/0.8.9, RF=3, BOP, counter columns inside super
>> columns. Read queries are multigetslices of super columns inside of which I
>> read every column for processing (20-30 at most), using Hector with default
>> settings.
>> Watching tpstat on the 3 nodes holding the data being most often queries,
>> I see the pending count increase only on the "main replica" and I see heavy
>> CPU load and network load only on that node. The other nodes seem to be
>> doing very little.
>>
>> Aren't counter read requests supposed to be round-robin across replicas ?
>> I'm confused as to why the nodes don't exhibit the same load.
>>
>> Thanks
>>
>>
>>
>

Re: Handling topology changes

2011-12-21 Thread Pierre-Yves Ritschard

A couple of nodes per month, but with peaks.
I will test the nodetool move based scenario then.

Cheers,
   - pyr

On Wed, Dec 21, 2011 at 10:10 PM, aaron morton  wrote:
> How often is "relatively often" ?
>
> * having a fixed amount of nodes with initial tokens and letting new
>  ones auto bootstrap themselves
>
> Generally a bad idea, you should make sure nodes re given sensible tokens
> that evenly distribute the data.
>

Re: Routine nodetool repair

2011-12-21 Thread Blake Starkenburg

Output from nodetool ring:

Address DC  RackStatus State   Load
OwnsToken

85070591730234615865843651857942052864
110.82.155.2   datacenter1 rack1   Up Normal  78.23 MB
50.00%  0
110.82.155.4   datacenter1 rack1   Up Normal  67.21 MB
50.00%  85070591730234615865843651857942052864

On Wed, Dec 21, 2011 at 1:18 PM, aaron morton wrote:

> Post the output from nodetool ring and take a look at
> http://wiki.apache.org/cassandra/Operations#Token_selection
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22/12/2011, at 5:21 AM, Blake Starkenburg wrote:
>
> Thank You!
>
> Could the lack of routine repair be why nodetool ring reports: node(1)
> Load -> 78.24 MB and node(2) Load -> 67.21 MB? The load span between the
> two nodes has been increasing ever so slowly...
>
> On Wed, Dec 21, 2011 at 1:00 AM, aaron morton wrote:
>
>> Here you go
>> http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds
>>
>> Cheers
>>
>>   -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 21/12/2011, at 2:44 PM, Blake Starkenburg wrote:
>>
>> I have been playing around with Cassandra for a few months now. Starting
>> to explore more of the routine maintenance and backup strategies and I have
>> a general question about nodetool repair. After reading the following page:
>> http://www.datastax.com/docs/0.8/operations/cluster_management it has
>> occurred to me that for these past few months I have NOT DONE any cleanup
>> or repair commands on a test 2-node cluster (and their has been quite a few
>> deletes, writes, etc.).
>>
>> For some reason I was under the assumption that Cassandra handled the
>> tombstone records from deletes automatically? Should I still run nodetool
>> repair and if so, what about old deletes which occurred months ago?
>>
>> Thank You!
>>
>>
>>
>
>

Monitoring move progress

2011-12-21 Thread Ethan Rowe

I've got some nodes in a "moving" state in a cluster (the nodes to which
they stream shouldn't overlap), and I'm finding it difficult to determine
if they're actually doing anything related to the move at this point, or if
they're stuck in the state and not actually doing anything.

In each case, I issued the move command per usual.

The log shows information about the move when it begins, showing the
correct token change that I would expect in each case.

Compactions took place on each moving node, which can be viewed through
"nodetool compactionstats" or through the CompactionManager in JMX.

But eventually the compactions stopped, apart from various ongoing
secondary index rebuilds and consequent related index compactions.  Yet I
see no stream transfers via netstats.  My expectation is that after the
compactions (which the project wiki refers to as "anti-compactions"), I
would start to see outbound streaming activity in netstats.  Yet I do not.

I don't see any errors listed in the logs on the moving servers since the
moves began.

Using cassandra 1.0.5.  ByteOrderedPartitioner.

Any suggestions on how to determine what's going on?

Thanks in advance.
- Ethan

Re: Routine nodetool repair

2011-12-21 Thread Peter Schuller

> Could the lack of routine repair be why nodetool ring reports: node(1) Load
> -> 78.24 MB and node(2) Load -> 67.21 MB? The load span between the two
> nodes has been increasing ever so slowly...

No.

Generally there will be a variation in load depending on what state
compaction happens to be in on the given node (I am assuming you're
not using leveled compaction). That is in addition to any imbalance
that might result from your population of data in the cluster.

Running repair can affect the live size, but *lack* of repair won't
cause a live size divergence.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Routine nodetool repair

Re: Can I slice on composite indexes?

Re: Doubts related to composite type column names/values

Re: Composite Column Question

Handling topology changes

Re: Doubts related to composite type column names/values

Creating column families per client

Re: Creating column families per client

Re: Creating column families per client

Re: Creating column families per client

Re: Routine nodetool repair

Re: questions on datastax opscenter

Re: Creating column families per client

Re: Creating column families per client

Re: Counter read requests spread across replicas ?

Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys

Re: Handling topology changes

Re: Doubts related to composite type column names/values

Re: Routine nodetool repair

Re: Counter read requests spread across replicas ?

Re: Counter read requests spread across replicas ?

Re: Handling topology changes

Re: Routine nodetool repair

Monitoring move progress

Re: Routine nodetool repair

25 matches

Site Navigation

Mail list logo

Footer information