understanding the cassandra storage scaling

2010-12-09 Thread Jonathan Colby
I have a very basic question which I have been unable to find in online documentation on cassandra. It seems like every node in a cassandra cluster contains all the data ever stored in the cluster (i.e., all nodes are identical). I don't understand how you can scale this on commodity servers with

Re: understanding the cassandra storage scaling

2010-12-09 Thread Ran Tavory
there are two numbers to look at, N the numbers of hosts in the ring (cluster) and R the number of replicas for each data item. R is configurable per column family. Typically for large clusters N >> R. For very small clusters if makes sense for R to be close to N in which case cassandra is useful s

unsubscribe

2010-12-09 Thread Massimo Carro
Massimo Carro www.liquida.it - www.liquida.com

Re: understanding the cassandra storage scaling

2010-12-09 Thread Jonathan Colby
Thanks Ran. This helps a little but unfortunately I'm still a bit fuzzy for me. So is it not true that each node contains all the data in the cluster? I haven't come across any information on how clustered data is coordinated in cassandra. how does my query get directed to the right node? On Th

Re: understanding the cassandra storage scaling

2010-12-09 Thread Ran Tavory
> > So is it not true that each node contains all the data in the cluster? No, not in the general case, in fact rarely is it the case. Usually Rhttp://wiki.apache.org/cassandra/StorageConfiguration On Thu, Dec 9, 2010 at 12:43 PM, Jonathan Colby wrote: > Thanks Ran. This helps a little but unf

Re: understanding the cassandra storage scaling

2010-12-09 Thread Sylvain Lebresne
> This helps a little but unfortunately I'm still a bit fuzzy for me. So is it > not true that each node contains all the data in the cluster? Not at all. Basically each node is responsible of only a part of the data (a range really). But for each data you can choose on how many nodes it is; this

Re: understanding the cassandra storage scaling

2010-12-09 Thread Jonathan Colby
awesome! Thank you guys for the really quick answers and the links to the presentations. On Thu, Dec 9, 2010 at 12:06 PM, Sylvain Lebresne wrote: >> This helps a little but unfortunately I'm still a bit fuzzy for me.  So is it >> not true that each node contains all the data in the cluster? > >

N to N relationships

2010-12-09 Thread Sébastien Druon
Hello, For a specific case, we are thinking about representing a N to N relationship with a NxN Matrix in Cassandra. The relations will be only between a subset of elements, so the Matrix will mostly contain empty elements. We have a set of questions concerning this: - what is the best way to rep

Re: N to N relationships

2010-12-09 Thread David Boxenhorn
How about a regular CF where keys are n...@n ? Then, getting a matrix row would be the same cost as getting a matrix column (N gets), and it would be very easy to add element N+1. On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon wrote: > Hello, > > For a specific case, we are thinking about rep

Secondary indexes change everything?

2010-12-09 Thread David Boxenhorn
It seems to me that secondary indexes (new in 0.7) change everything when it comes to data modeling. - OOP becomes obsolete - primary indexes become obsolete if you ever want to do a range query (which you probably will...), better to assign a random row id Taken together, it's likely that very l

Re: Secondary indexes change everything?

2010-12-09 Thread David Boxenhorn
- OPP becomes obsolete (OOP is not obsolete!) - primary indexes become obsolete if you ever want to do a range query (which you probably will...), better to assign a random row id Taken together, it's likely that very little will remain of your old database schema... Am I right?

Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Timo Nentwig
Hi! I've 3 servers running (0.7rc1) with a replication_factor of 2 and use quorum for writes. But when I shut down one of them UnavailableExceptions are thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant DB that it continues with the remaining 2 nodes and redistributes the

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Thibaut Britz
Hi, The UnavailableExceptions will be thrown because quorum of size 2 needs at least 2 nodes to be alive (as for qurom of size 3 as well). The data won't be automatically redistributed to other nodes. Thibaut On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig wrote: > Hi! > > I've 3 servers runnin

Re: unsubscribe

2010-12-09 Thread Eric Evans
On Thu, 2010-12-09 at 11:42 +0100, Massimo Carro wrote: > Massimo Carro > > www.liquida.it - www.liquida.com http://wiki.apache.org/cassandra/FAQ#unsubscribe -- Eric Evans eev...@rackspace.com

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Daniel Lundin
Quorum is really only useful when RF > 2, since the for a quorum to succeed RF/2+1 replicas must be available. This means for RF = 2, consistency levels QUORUM and ALL yield the same result. /d On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig wrote: > Hi! > > I've 3 servers running (0.7rc1) with a

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Timo Nentwig
On Dec 9, 2010, at 16:50, Daniel Lundin wrote: > Quorum is really only useful when RF > 2, since the for a quorum to > succeed RF/2+1 replicas must be available. 2/2+1==2 and I killed 1 of 3, so... don't get it. > This means for RF = 2, consistency levels QUORUM and ALL yield the same > result

RE: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Viktor Jevdokimov
With 3 nodes and RF=2 you have 3 key ranges: N1+N2, N2+N3 and N3+N1. Killing N1 you've got only 1 alive range N2+N3 and 2/3 of the range is down for Quorum, which is actually all, so N1+N2 and N3+N1 fails. -Original Message- From: Timo Nentwig [mailto:timo.nent...@toptarif.de] Sent: Thur

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Sylvain Lebresne
I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you have 2 replicas. And since quorum is also 2 with that replication factor, you cannot lose a node, otherwise some query will end up as UnavailableException. Again, this is not related to the total number of nodes. Even with

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread David Boxenhorn
In other words, if you want to use QUORUM, you need to set RF>=3. (I know because I had exactly the same problem.) On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne wrote: > I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you > have > 2 replicas. And since quorum is also

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Timo Nentwig
On Dec 9, 2010, at 17:39, David Boxenhorn wrote: > In other words, if you want to use QUORUM, you need to set RF>=3. > > (I know because I had exactly the same problem.) I naively assume that if I kill either node that holds N1 (i.e. node 1 or 3), N1 will still remain on another node. Only i

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread David Boxenhorn
If that is what you want, use CL=ONE On Thu, Dec 9, 2010 at 6:43 PM, Timo Nentwig wrote: > > On Dec 9, 2010, at 17:39, David Boxenhorn wrote: > > > In other words, if you want to use QUORUM, you need to set RF>=3. > > > > (I know because I had exactly the same problem.) > > I naively assume that

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Nick Bailey
On Thu, Dec 9, 2010 at 10:43 AM, Timo Nentwig wrote: > > On Dec 9, 2010, at 17:39, David Boxenhorn wrote: > > > In other words, if you want to use QUORUM, you need to set RF>=3. > > > > (I know because I had exactly the same problem.) > > I naively assume that if I kill either node that holds N1 (

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Sylvain Lebresne
> I naively assume that if I kill either node that holds N1 (i.e. node 1 or 3), > N1 will still remain on another node. Only if both fail, I actually lose > data. But apparently this is not how it works... Sure, the data that N1 holds is also on another node and you won't lose it by only losing

Cassandra and disk space

2010-12-09 Thread Mark
I recently ran into a problem during a repair operation where my nodes completely ran out of space and my whole cluster was... well, clusterfucked. I want to make sure how to prevent this problem in the future. Should I make sure that at all times every node is under 50% of its disk space? Ar

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Timo Nentwig
On Dec 9, 2010, at 17:55, Sylvain Lebresne wrote: >> I naively assume that if I kill either node that holds N1 (i.e. node 1 or >> 3), N1 will still remain on another node. Only if both fail, I actually lose >> data. But apparently this is not how it works... > > Sure, the data that N1 holds is

Re: N to N relationships

2010-12-09 Thread Sébastien Druon
Thanks a lot for the answer What about the indexing when adding a new element? Is it incremental? Thanks again On 9 December 2010 14:38, David Boxenhorn wrote: > How about a regular CF where keys are n...@n ? > > Then, getting a matrix row would be the same cost as getting a matrix > column (N

Re: N to N relationships

2010-12-09 Thread David Boxenhorn
What do you mean by indexing? On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon wrote: > Thanks a lot for the answer > > What about the indexing when adding a new element? Is it incremental? > > Thanks again > > > On 9 December 2010 14:38, David Boxenhorn wrote: > >> How about a regular CF where

Re: Secondary indexes change everything?

2010-12-09 Thread Tyler Hobbs
OPP is not yet obsolete. The included secondary indexes still aren't good at finding keys for ranges of indexed values, such as " name > 'b' and name < 'c' ". This is something that an OPP index would be good at. Of course, you can do something similar with one or more rows, so it's not that big

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Tyler Hobbs
If you switch your writes to CL ONE when a failure occurs, you might as well use ONE for all writes. ONE and QUORUM behave the same when all nodes are working correctly. - Tyler On Thu, Dec 9, 2010 at 11:26 AM, Timo Nentwig wrote: > > On Dec 9, 2010, at 17:55, Sylvain Lebresne wrote: > > >> I n

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Sylvain Lebresne
> And my application would fall back to ONE. Quorum writes will also fail so I > would also use ONE so that the app stays up. What would I have to do make the > data to redistribute when the broken node is up again? Simply call nodetool > repair on it? There is 3 mechanisms for that: - hinted

Re: Cassandra and disk space

2010-12-09 Thread Peter Schuller
> I recently ran into a problem during a repair operation where my nodes > completely ran out of space and my whole cluster was... well, clusterfucked. > > I want to make sure how to prevent this problem in the future. Depending on which version you're on, you may be seeing this: https://issue

Re: Cassandra and disk space

2010-12-09 Thread Tyler Hobbs
If you are on 0.6, repair is particularly dangerous with respect to disk space usage. If your replica is sufficiently out of sync, you can triple your disk usage pretty easily. This has been improved in 0.7, so repairs should use about half as much disk space, on average. In general, yes, keep y

Re: N to N relationships

2010-12-09 Thread Sébastien Druon
I mean if I have secondary indexes. Apparently they are calculated in the background... On 9 December 2010 18:33, David Boxenhorn wrote: > What do you mean by indexing? > > On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon wrote: > >> Thanks a lot for the answer >> >> What about the indexing when

Re: Secondary indexes change everything?

2010-12-09 Thread David Boxenhorn
What do you mean by, "The included secondary indexes still aren't good at finding keys for ranges of indexed values, such as " name > 'b' and name < 'c' "."? Do you mean that secondary indexes don't support range queries at all? Besides supporting range queries, I see the importance of secondary

Re: Cassandra and disk space

2010-12-09 Thread Rustam Aliyev
Is there any plans to improve this in future? For big data clusters this could be very expensive. Based on your comment, I will need 200TB of storage for 100TB of data to keep Cassandra running. -- Rustam. On 09/12/2010 17:56, Tyler Hobbs wrote: If you are on 0.6, repair is particularly dang

Stuck with adding nodes

2010-12-09 Thread Daniel Doubleday
Hi good people. I underestimated load during peak times and now I'm stuck with our production cluster. Right now its 3 nodes, rf 3 so everything is everywhere. We have ~300GB data load. ~10MB/sec incoming traffic and ~50 (peak) reads/sec to the cluster The problem derives from our quorum read

Re: Stuck with adding nodes

2010-12-09 Thread Peter Schuller
> Currently I am copying all data files (thats all existing data) from one node > to the new nodes in hope that I could than manually assign them their new > tokenrange (nodetool move) and do cleanup. Unless I'm misunderstanding you I believe you should be setting the initial token. nodetool mov

Re: Cassandra and disk space

2010-12-09 Thread Tyler Hobbs
That depends on your scenario. In the worst case of one big CF, there's not much that can be easily done for the disk usage of compaction and cleanup (which is essentially compaction). If, instead, you have several column families and no single CF makes up the majority of your data, you can push

Re: Cassandra and disk space

2010-12-09 Thread Scott Dworkis
i recently finished a practice expansion of 4 nodes to 5 nodes, a series of "nodetool move", "nodetool cleanup" and jmx gc steps. i found that in some of the steps, disk usage actually grew to 2.5x the base data size on one of the nodes. i'm using 0.6.4. -scott On Thu, 9 Dec 2010, Rustam Al

Re: N to N relationships

2010-12-09 Thread Aaron Morton
Am assuming you have one matrix and you know the dimensions. Also as you say the most important queries are to get an entire column or an entire row.I would consider using a standard CF for the Columns and one for the Rows.  The key for each would be the col / row number, each cassandra column name

Re: Secondary indexes change everything?

2010-12-09 Thread Jonathan Ellis
On Thu, Dec 9, 2010 at 12:16 PM, David Boxenhorn wrote: > What do you mean by, "The included secondary indexes still aren't good at > finding keys for ranges of indexed values, such as " name > 'b' and name < > 'c' "."? > > Do you mean that secondary indexes don't support range queries at all? ht

[OT] shout out for riptano training

2010-12-09 Thread Dave Viner
Just wanted to give a shout-out to Jonathan Ellis & the Riptano team for the awesome training they provided yesterday in Santa Monica. It was awesome, and I'd highly recommend it for anyone who is using or seriously considering using Cassandra. Just. freakin awesome. Dave Viner

Re: Running multiple instances on a single server --micrandra ??

2010-12-09 Thread Ryan King
Overall, I don't think this is a crazy idea, though I think I'd prefer cassandra to manage this setup. The problem you will run into is that because the storage port is assumed to be the same across the cluster you'll only be able to do this if you can assign multiple IPs to each server (one for e

Obscured question about data size in a Column Family

2010-12-09 Thread Joshua Partogi
Hi there, Quoting an information in the wiki about Cassandra limitations ( http://wiki.apache.org/cassandra/CassandraLimitations): ... So all the data from a given columnfamily/key pair had to fit in memory, or 2GB ... Does this mean 1. A ColumnFamily can only be 2GB of data 2. A Column (key/pair

Re: Cassandra and disk space

2010-12-09 Thread Rustam Aliyev
That depends on your scenario. In the worst case of one big CF, there's not much that can be easily done for the disk usage of compaction and cleanup (which is essentially compaction). If, instead, you have several column families and no single CF makes up the majority of your data, you can

Re: Cassandra and disk space

2010-12-09 Thread Tyler Hobbs
Yes, that's correct, but I wouldn't push it too far. You'll become much more sensitive to disk usage changes; in particular, rebalancing your cluster will particularly difficult, and repair will also become dangerous. Disk performance also tends to drop when a disk nears capacity. There's no reco

Re: Cassandra and disk space

2010-12-09 Thread Nick Bailey
Additionally, cleanup will fail to run when the disk is more than 50% full. Another reason to stay below 50%. On Thu, Dec 9, 2010 at 6:03 PM, Tyler Hobbs wrote: > Yes, that's correct, but I wouldn't push it too far. You'll become much > more sensitive to disk usage changes; in particular, rebal

Re: Cassandra and disk space

2010-12-09 Thread Rustam Aliyev
Thanks Tyler, this is really useful. Also, I noticed that you can specify multiple data file directories located on different disks. Let's say if I have machine with 4 x 500GB drives, what would be the difference between following 2 setups: 1. each drive mounted separately and has data file

Re: Cassandra and disk space

2010-12-09 Thread Robert Coli
On Thu, Dec 9, 2010 at 4:20 PM, Rustam Aliyev wrote: > Thanks Tyler, this is really useful. > [ RAID0 vs JBOD question ] > In other words, does splitting data folder into smaller ones bring any > performance or stability advantages? This is getting to be a FAQ, so here's my stock answer : There

Re: Cassandra and disk space

2010-12-09 Thread Brandon Williams
On Thu, Dec 9, 2010 at 6:20 PM, Rustam Aliyev wrote: > Also, I noticed that you can specify multiple data file directories located > on different disks. Let's say if I have machine with 4 x 500GB drives, what > would be the difference between following 2 setups: > >1. each drive mounted separ

Re: [OT] shout out for riptano training

2010-12-09 Thread Sal Fuentes
I second that as well. I actually found the training to be fun (love the new stuff in 0.7.0) and quite interesting. Now I'm looking forward to the next Cassandra Summit. Thank you Riptano. On Thu, Dec 9, 2010 at 2:48 PM, Dave Viner wrote: > Just wanted to give a shout-out to Jonathan Ellis & the

Re: N to N relationships

2010-12-09 Thread Nick Bailey
I would also recommend two column families. Storing the key as NxN would require you to hit multiple machines to query for an entire row or column with RandomPartitioner. Even with OPP you would need to pick row or columns to order by and the other would require hitting multiple machines. Two colu

[RELEASE] 0.7.0 rc2

2010-12-09 Thread Eric Evans
I'd have thought all that turkey and stuffing would have done more damage to momentum, but judging by the number of bug-fixes in the last couple of weeks, that isn't the case. As usual, I'd be remiss if I didn't point out that this is not yet a stable release. It's getting pretty close, but we'r

Re: Obscured question about data size in a Column Family

2010-12-09 Thread Jonathan Ellis
In <= 0.6 (but not 0.7) a row could not be larger than 2GB. 2GB is still the largest possible column value. On Thu, Dec 9, 2010 at 5:38 PM, Joshua Partogi wrote: > Hi there, > > Quoting an information in the wiki about Cassandra limitations > (http://wiki.apache.org/cassandra/CassandraLimitation

Re: NullPointerException in Beta3 and rc1

2010-12-09 Thread Wenjun Che
describe_schema_versions() returns a Map> with one entry. The key is an UUID and List has one element, which is IP of my machine. I think this has something to do with 'truncate' command in CLI, I can reproduce by: 1. create a CF with column1 as a secondary index 2. add some rows 3. truncate t

Re: Cassandra and disk space

2010-12-09 Thread Bill de hÓra
This is true, but for larger installations I end up needing more servers to hold the disks, more racks to hold the servers the point where the overall cost per GB climbs (granted the cost per IOP is probably still good). AIUI, a chunk of that 50% is replicated data such that the truly available s

Re: NullPointerException in Beta3 and rc1

2010-12-09 Thread Jonathan Ellis
Can you still reproduce this with rc2, after starting with an empty data and commitlog directory? There used to be a bug w/ truncate + 2ary indexes but that should be fixed now. On Thu, Dec 9, 2010 at 8:53 PM, Wenjun Che wrote: > describe_schema_versions()  returns a Map> with one > entry.  The

Re: Running multiple instances on a single server --micrandra ??

2010-12-09 Thread Bill de hÓra
On Tue, 2010-12-07 at 21:25 -0500, Edward Capriolo wrote: > The idea behind "micrandra" is for a 6 disk system run 6 instances of > Cassandra, one per disk. Use the RackAwareSnitch to make sure no > replicas live on the same node. > > The downsides > 1) we would have to manage 6x the instances