I have a very basic question which I have been unable to find in
online documentation on cassandra.
It seems like every node in a cassandra cluster contains all the data
ever stored in the cluster (i.e., all nodes are identical). I don't
understand how you can scale this on commodity servers with
there are two numbers to look at, N the numbers of hosts in the ring
(cluster) and R the number of replicas for each data item. R is configurable
per column family.
Typically for large clusters N >> R. For very small clusters if makes sense
for R to be close to N in which case cassandra is useful s
Massimo Carro
www.liquida.it - www.liquida.com
Thanks Ran. This helps a little but unfortunately I'm still a bit
fuzzy for me. So is it not true that each node contains all the data
in the cluster? I haven't come across any information on how clustered
data is coordinated in cassandra. how does my query get directed to
the right node?
On Th
>
> So is it not true that each node contains all the data in the cluster?
No, not in the general case, in fact rarely is it the case. Usually Rhttp://wiki.apache.org/cassandra/StorageConfiguration
On Thu, Dec 9, 2010 at 12:43 PM, Jonathan Colby wrote:
> Thanks Ran. This helps a little but unf
> This helps a little but unfortunately I'm still a bit fuzzy for me. So is it
> not true that each node contains all the data in the cluster?
Not at all. Basically each node is responsible of only a part of the data (a
range really). But for each data you can choose on how many nodes it is; this
awesome! Thank you guys for the really quick answers and the links to
the presentations.
On Thu, Dec 9, 2010 at 12:06 PM, Sylvain Lebresne wrote:
>> This helps a little but unfortunately I'm still a bit fuzzy for me. So is it
>> not true that each node contains all the data in the cluster?
>
>
Hello,
For a specific case, we are thinking about representing a N to N
relationship with a NxN Matrix in Cassandra.
The relations will be only between a subset of elements, so the Matrix will
mostly contain empty elements.
We have a set of questions concerning this:
- what is the best way to rep
How about a regular CF where keys are n...@n ?
Then, getting a matrix row would be the same cost as getting a matrix column
(N gets), and it would be very easy to add element N+1.
On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon wrote:
> Hello,
>
> For a specific case, we are thinking about rep
It seems to me that secondary indexes (new in 0.7) change everything when it
comes to data modeling.
- OOP becomes obsolete
- primary indexes become obsolete if you ever want to do a range query
(which you probably will...), better to assign a random row id
Taken together, it's likely that very l
- OPP becomes obsolete (OOP is not obsolete!)
- primary indexes become obsolete if you ever want to do a range query
(which you probably will...), better to assign a random row id
Taken together, it's likely that very little will remain of your old
database schema...
Am I right?
Hi!
I've 3 servers running (0.7rc1) with a replication_factor of 2 and use quorum
for writes. But when I shut down one of them UnavailableExceptions are thrown.
Why is that? Isn't that the sense of quorum and a fault-tolerant DB that it
continues with the remaining 2 nodes and redistributes the
Hi,
The UnavailableExceptions will be thrown because quorum of size 2
needs at least 2 nodes to be alive (as for qurom of size 3 as well).
The data won't be automatically redistributed to other nodes.
Thibaut
On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig wrote:
> Hi!
>
> I've 3 servers runnin
On Thu, 2010-12-09 at 11:42 +0100, Massimo Carro wrote:
> Massimo Carro
>
> www.liquida.it - www.liquida.com
http://wiki.apache.org/cassandra/FAQ#unsubscribe
--
Eric Evans
eev...@rackspace.com
Quorum is really only useful when RF > 2, since the for a quorum to
succeed RF/2+1 replicas must be available.
This means for RF = 2, consistency levels QUORUM and ALL yield the same result.
/d
On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig wrote:
> Hi!
>
> I've 3 servers running (0.7rc1) with a
On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
> Quorum is really only useful when RF > 2, since the for a quorum to
> succeed RF/2+1 replicas must be available.
2/2+1==2 and I killed 1 of 3, so... don't get it.
> This means for RF = 2, consistency levels QUORUM and ALL yield the same
> result
With 3 nodes and RF=2 you have 3 key ranges: N1+N2, N2+N3 and N3+N1.
Killing N1 you've got only 1 alive range N2+N3 and 2/3 of the range is down for
Quorum, which is actually all, so N1+N2 and N3+N1 fails.
-Original Message-
From: Timo Nentwig [mailto:timo.nent...@toptarif.de]
Sent: Thur
I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you have
2 replicas. And since quorum is also 2 with that replication factor,
you cannot lose
a node, otherwise some query will end up as UnavailableException.
Again, this is not related to the total number of nodes. Even with
In other words, if you want to use QUORUM, you need to set RF>=3.
(I know because I had exactly the same problem.)
On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne wrote:
> I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you
> have
> 2 replicas. And since quorum is also
On Dec 9, 2010, at 17:39, David Boxenhorn wrote:
> In other words, if you want to use QUORUM, you need to set RF>=3.
>
> (I know because I had exactly the same problem.)
I naively assume that if I kill either node that holds N1 (i.e. node 1 or 3),
N1 will still remain on another node. Only i
If that is what you want, use CL=ONE
On Thu, Dec 9, 2010 at 6:43 PM, Timo Nentwig wrote:
>
> On Dec 9, 2010, at 17:39, David Boxenhorn wrote:
>
> > In other words, if you want to use QUORUM, you need to set RF>=3.
> >
> > (I know because I had exactly the same problem.)
>
> I naively assume that
On Thu, Dec 9, 2010 at 10:43 AM, Timo Nentwig wrote:
>
> On Dec 9, 2010, at 17:39, David Boxenhorn wrote:
>
> > In other words, if you want to use QUORUM, you need to set RF>=3.
> >
> > (I know because I had exactly the same problem.)
>
> I naively assume that if I kill either node that holds N1 (
> I naively assume that if I kill either node that holds N1 (i.e. node 1 or 3),
> N1 will still remain on another node. Only if both fail, I actually lose
> data. But apparently this is not how it works...
Sure, the data that N1 holds is also on another node and you won't
lose it by only losing
I recently ran into a problem during a repair operation where my nodes
completely ran out of space and my whole cluster was... well,
clusterfucked.
I want to make sure how to prevent this problem in the future.
Should I make sure that at all times every node is under 50% of its disk
space? Ar
On Dec 9, 2010, at 17:55, Sylvain Lebresne wrote:
>> I naively assume that if I kill either node that holds N1 (i.e. node 1 or
>> 3), N1 will still remain on another node. Only if both fail, I actually lose
>> data. But apparently this is not how it works...
>
> Sure, the data that N1 holds is
Thanks a lot for the answer
What about the indexing when adding a new element? Is it incremental?
Thanks again
On 9 December 2010 14:38, David Boxenhorn wrote:
> How about a regular CF where keys are n...@n ?
>
> Then, getting a matrix row would be the same cost as getting a matrix
> column (N
What do you mean by indexing?
On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon wrote:
> Thanks a lot for the answer
>
> What about the indexing when adding a new element? Is it incremental?
>
> Thanks again
>
>
> On 9 December 2010 14:38, David Boxenhorn wrote:
>
>> How about a regular CF where
OPP is not yet obsolete.
The included secondary indexes still aren't good at finding keys for ranges
of indexed values, such as " name > 'b' and name < 'c' ". This is something
that an OPP index would be good at. Of course, you can do something similar
with one or more rows, so it's not that big
If you switch your writes to CL ONE when a failure occurs, you might as well
use ONE for all writes. ONE and QUORUM behave the same when all nodes are
working correctly.
- Tyler
On Thu, Dec 9, 2010 at 11:26 AM, Timo Nentwig wrote:
>
> On Dec 9, 2010, at 17:55, Sylvain Lebresne wrote:
>
> >> I n
> And my application would fall back to ONE. Quorum writes will also fail so I
> would also use ONE so that the app stays up. What would I have to do make the
> data to redistribute when the broken node is up again? Simply call nodetool
> repair on it?
There is 3 mechanisms for that:
- hinted
> I recently ran into a problem during a repair operation where my nodes
> completely ran out of space and my whole cluster was... well, clusterfucked.
>
> I want to make sure how to prevent this problem in the future.
Depending on which version you're on, you may be seeing this:
https://issue
If you are on 0.6, repair is particularly dangerous with respect to disk
space usage. If your replica is sufficiently out of sync, you can triple
your disk usage pretty easily. This has been improved in 0.7, so repairs
should use about half as much disk space, on average.
In general, yes, keep y
I mean if I have secondary indexes. Apparently they are calculated in the
background...
On 9 December 2010 18:33, David Boxenhorn wrote:
> What do you mean by indexing?
>
> On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon wrote:
>
>> Thanks a lot for the answer
>>
>> What about the indexing when
What do you mean by, "The included secondary indexes still aren't good at
finding keys for ranges of indexed values, such as " name > 'b' and name <
'c' "."?
Do you mean that secondary indexes don't support range queries at all?
Besides supporting range queries, I see the importance of secondary
Is there any plans to improve this in future?
For big data clusters this could be very expensive. Based on your
comment, I will need 200TB of storage for 100TB of data to keep
Cassandra running.
--
Rustam.
On 09/12/2010 17:56, Tyler Hobbs wrote:
If you are on 0.6, repair is particularly dang
Hi good people.
I underestimated load during peak times and now I'm stuck with our production
cluster.
Right now its 3 nodes, rf 3 so everything is everywhere. We have ~300GB data
load. ~10MB/sec incoming traffic and ~50 (peak) reads/sec to the cluster
The problem derives from our quorum read
> Currently I am copying all data files (thats all existing data) from one node
> to the new nodes in hope that I could than manually assign them their new
> tokenrange (nodetool move) and do cleanup.
Unless I'm misunderstanding you I believe you should be setting the
initial token. nodetool mov
That depends on your scenario. In the worst case of one big CF, there's not
much that can be easily done for the disk usage of compaction and cleanup
(which is essentially compaction).
If, instead, you have several column families and no single CF makes up the
majority of your data, you can push
i recently finished a practice expansion of 4 nodes to 5 nodes, a series
of "nodetool move", "nodetool cleanup" and jmx gc steps. i found that in
some of the steps, disk usage actually grew to 2.5x the base data size on
one of the nodes. i'm using 0.6.4.
-scott
On Thu, 9 Dec 2010, Rustam Al
Am assuming you have one matrix and you know the dimensions. Also as you say the most important queries are to get an entire column or an entire row.I would consider using a standard CF for the Columns and one for the Rows. The key for each would be the col / row number, each cassandra column name
On Thu, Dec 9, 2010 at 12:16 PM, David Boxenhorn wrote:
> What do you mean by, "The included secondary indexes still aren't good at
> finding keys for ranges of indexed values, such as " name > 'b' and name <
> 'c' "."?
>
> Do you mean that secondary indexes don't support range queries at all?
ht
Just wanted to give a shout-out to Jonathan Ellis & the Riptano team for the
awesome training they provided yesterday in Santa Monica. It was awesome,
and I'd highly recommend it for anyone who is using or seriously considering
using Cassandra.
Just. freakin awesome.
Dave Viner
Overall, I don't think this is a crazy idea, though I think I'd prefer
cassandra to manage this setup.
The problem you will run into is that because the storage port is
assumed to be the same across the cluster you'll only be able to do
this if you can assign multiple IPs to each server (one for e
Hi there,
Quoting an information in the wiki about Cassandra limitations (
http://wiki.apache.org/cassandra/CassandraLimitations):
... So all the data from a given columnfamily/key pair had to fit in memory,
or 2GB ...
Does this mean
1. A ColumnFamily can only be 2GB of data
2. A Column (key/pair
That depends on your scenario. In the worst case of one big CF,
there's not much that can be easily done for the disk usage of
compaction and cleanup (which is essentially compaction).
If, instead, you have several column families and no single CF makes
up the majority of your data, you can
Yes, that's correct, but I wouldn't push it too far. You'll become much
more sensitive to disk usage changes; in particular, rebalancing your
cluster will particularly difficult, and repair will also become dangerous.
Disk performance also tends to drop when a disk nears capacity.
There's no reco
Additionally, cleanup will fail to run when the disk is more than 50% full.
Another reason to stay below 50%.
On Thu, Dec 9, 2010 at 6:03 PM, Tyler Hobbs wrote:
> Yes, that's correct, but I wouldn't push it too far. You'll become much
> more sensitive to disk usage changes; in particular, rebal
Thanks Tyler, this is really useful.
Also, I noticed that you can specify multiple data file directories
located on different disks. Let's say if I have machine with 4 x 500GB
drives, what would be the difference between following 2 setups:
1. each drive mounted separately and has data file
On Thu, Dec 9, 2010 at 4:20 PM, Rustam Aliyev wrote:
> Thanks Tyler, this is really useful.
> [ RAID0 vs JBOD question ]
> In other words, does splitting data folder into smaller ones bring any
> performance or stability advantages?
This is getting to be a FAQ, so here's my stock answer :
There
On Thu, Dec 9, 2010 at 6:20 PM, Rustam Aliyev wrote:
> Also, I noticed that you can specify multiple data file directories located
> on different disks. Let's say if I have machine with 4 x 500GB drives, what
> would be the difference between following 2 setups:
>
>1. each drive mounted separ
I second that as well. I actually found the training to be fun (love the new
stuff in 0.7.0) and quite interesting. Now I'm looking forward to the next
Cassandra Summit. Thank you Riptano.
On Thu, Dec 9, 2010 at 2:48 PM, Dave Viner wrote:
> Just wanted to give a shout-out to Jonathan Ellis & the
I would also recommend two column families. Storing the key as NxN would
require you to hit multiple machines to query for an entire row or column
with RandomPartitioner. Even with OPP you would need to pick row or columns
to order by and the other would require hitting multiple machines. Two
colu
I'd have thought all that turkey and stuffing would have done more
damage to momentum, but judging by the number of bug-fixes in the last
couple of weeks, that isn't the case.
As usual, I'd be remiss if I didn't point out that this is not yet a
stable release. It's getting pretty close, but we'r
In <= 0.6 (but not 0.7) a row could not be larger than 2GB.
2GB is still the largest possible column value.
On Thu, Dec 9, 2010 at 5:38 PM, Joshua Partogi wrote:
> Hi there,
>
> Quoting an information in the wiki about Cassandra limitations
> (http://wiki.apache.org/cassandra/CassandraLimitation
describe_schema_versions() returns a Map> with one
entry. The key is an UUID and List has one element, which is IP of
my machine.
I think this has something to do with 'truncate' command in CLI, I can
reproduce by:
1. create a CF with column1 as a secondary index
2. add some rows
3. truncate t
This is true, but for larger installations I end up needing more
servers to hold the disks, more racks to hold the servers the point
where the overall cost per GB climbs (granted the cost per IOP is
probably still good). AIUI, a chunk of that 50% is replicated data such
that the truly available s
Can you still reproduce this with rc2, after starting with an empty
data and commitlog directory?
There used to be a bug w/ truncate + 2ary indexes but that should be fixed now.
On Thu, Dec 9, 2010 at 8:53 PM, Wenjun Che wrote:
> describe_schema_versions() returns a Map> with one
> entry. The
On Tue, 2010-12-07 at 21:25 -0500, Edward Capriolo wrote:
> The idea behind "micrandra" is for a 6 disk system run 6 instances of
> Cassandra, one per disk. Use the RackAwareSnitch to make sure no
> replicas live on the same node.
>
> The downsides
> 1) we would have to manage 6x the instances
58 matches
Mail list logo