Re: Data Distribution in Table/Column Family

2015-08-27 Thread Jack Krupansky
Even if the data were absolutely evenly distributed, that won't guarantee that the hash values of the partition keys used in your client queries won't collide a result in a hotspot. Another possibility is that your data is not partitioned well at the primary key level. Are you using clustering key

Re: Data Distribution in Table/Column Family

2015-08-27 Thread Alain RODRIGUEZ
Hi, Did you try to run the following on all your nodes and compare ? du -sh /*whatever*/cassandra/data/* Of course if you have unequal snapshots sizes remove them in the above command (or directly remove them). This should answer (barely) your question about an eventual even distribution (/!\ h

Re: data distribution along column family partitions

2015-02-04 Thread Marcelo Valle (BLOOMBERG/ LONDON)
From: clohfin...@gmail.com Subject: Re: data distribution along column family partitions > not ok :) don't let a single partition get to 1gb, 100's of mb should be when > flares are going up. The main reasoning is compactions would be horrifically > slow and there will

Re: data distribution along column family partitions

2015-02-04 Thread Chris Lohfink
page the query > across multiple partitions if Y-X > bucket size. > > If I use paging, Cassandra won't try to allocate the whole partition on > the server node, it will just allocate memory in the heap for that page. > Check? > > Marcelo Valle > > From: user@c

Re: data distribution along column family partitions

2015-02-04 Thread Marcelo Valle (BLOOMBERG/ LONDON)
x27;t try to allocate the whole partition on the server node, it will just allocate memory in the heap for that page. Check? Marcelo Valle From: user@cassandra.apache.org Subject: Re: data distribution along column family partitions The data model lgtm. You may need to balance the size of the ti

Re: data distribution along column family partitions

2015-02-04 Thread Chris Lohfink
The data model lgtm. You may need to balance the size of the time buckets with the amount of alarms to prevent partitions from getting too large. 1 month may be a little large, I would aim to keep the partitions below 25mb (can check with nodetool cfstats) or so in size to keep everything happy.

Re: Data distribution

2011-02-15 Thread mcasandra
HH is one aspect and the other aspect is when new node join there need to be some balancing that need to occur, this may take time as well. But I also understand it will add lot of complexity in the code. Is there any place where I can read other things of concern that one should be aware of? --

Re: Data distribution

2011-02-15 Thread Robert Coli
On Tue, Feb 15, 2011 at 3:05 PM, mcasandra wrote: > > Is there a way to let the new node join cluster in the background and make it > live to clients only after it has finished with node repair, syncing data > etc. and in the end sync keys or trees that's needed before it's come to > life. I know

Re: Data distribution

2011-02-15 Thread mcasandra
Thanks! Would Hector take care of not load balancing to the new node until it's ready? Also, when repair is occuring in background is there a status that I can look at to see that repair is occuring for key ABC. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146

Re: Data distribution

2011-02-15 Thread Matthew Dennis
Assuming you aren't changing the RC, the normal bootstrap process takes care of all the problems like that, making sure things work correctly. Most importantly, if something fails (either the new node or any of the existing nodes) you can recover from it. Just don't connect clients directly to th

Re: Data distribution

2011-02-15 Thread mcasandra
Is there a way to let the new node join cluster in the background and make it live to clients only after it has finished with node repair, syncing data etc. and in the end sync keys or trees that's needed before it's come to life. I know it can be tricky since it needs to be live as soon as it ste

Re: Data distribution

2011-02-14 Thread Matthew Dennis
regardless of increasing RF or not, RR happens based on the read_repair_chance setting. RR happens after the request has been replied to though, so it's possible that if you increase the RF and then read that the read might get stale/missing data. RR would then put the correct value on all the co

RE: Data distribution

2011-02-14 Thread mcasandra
When I increase the replication factor does the repair happen automatically in background when client first tries to access data from the node where data does not exist. Or the nodetool repair need to run after increasing the replication factor. -- View this message in context: http://cassandra

Re: Data distribution

2011-02-14 Thread Matthew Dennis
On Mon, Feb 14, 2011 at 6:58 PM, Dan Hendry wrote: > > 1) If I insert a key and want to verify which node it went to then how do > I > > do that? > > I don't think you can and there should be no reason to care. Cassandra > abstracts where data is being stored, think in terms of consistency levels

RE: Data distribution

2011-02-14 Thread Dan Hendry
> 1) If I insert a key and want to verify which node it went to then how do I > do that? I don't think you can and there should be no reason to care. Cassandra abstracts where data is being stored, think in terms of consistency levels not actual nodes. > 2) How can I verify if the replication is

Re: Data Distribution / Replication

2010-08-14 Thread Benjamin Black
#546 #1076 #1169 #1377 etc... On Sat, Aug 14, 2010 at 12:05 PM, Bill de hÓra wrote: > That data suggests the inbuilt tools are a hazard and manual workarounds > less so. > > Can you point me at the bugs? > > Bill > > > On Fri, 2010-08-13 at 20:30 -0700, Benjamin Black wrote: >> Number of bugs I'

Re: Data Distribution / Replication

2010-08-14 Thread Benjamin Black
On Fri, Aug 13, 2010 at 10:13 PM, Stefan Kaufmann wrote: >> My recommendation is to leave Autobootstrap disabled, copy the >> datafiles over, and then run cleanup.  It is faster and more reliable >> than streaming, in my experience. > > I thought about copying da Data manually. However if I have a

Re: Data Distribution / Replication

2010-08-14 Thread Bill de hÓra
That data suggests the inbuilt tools are a hazard and manual workarounds less so. Can you point me at the bugs? Bill On Fri, 2010-08-13 at 20:30 -0700, Benjamin Black wrote: > Number of bugs I've hit doing this with scp: 0 > Number of bugs I've hit with streaming: 2 (and others found more) >

Re: Data Distribution / Replication

2010-08-13 Thread Stefan Kaufmann
> My recommendation is to leave Autobootstrap disabled, copy the > datafiles over, and then run cleanup.  It is faster and more reliable > than streaming, in my experience. I thought about copying da Data manually. However if I have a running environment and add a node (or replace a broken one), h

Re: Data Distribution / Replication

2010-08-13 Thread Benjamin Black
Number of bugs I've hit doing this with scp: 0 Number of bugs I've hit with streaming: 2 (and others found more) Also easier to monitor progress, manage bandwidth, etc. I just prefer using specialized tools that are really good at specific things. This is such a case. b On Fri, Aug 13, 2010 at

Re: Data Distribution / Replication

2010-08-13 Thread Bill de hÓra
On Fri, 2010-08-13 at 09:51 -0700, Benjamin Black wrote: > My recommendation is to leave Autobootstrap disabled, copy the > datafiles over, and then run cleanup. It is faster and more reliable > than streaming, in my experience. What is less reliable about streaming? Bill

Re: Data Distribution / Replication

2010-08-13 Thread Benjamin Black
On Fri, Aug 13, 2010 at 9:48 AM, Oleg Anastasjev wrote: > Benjamin Black b3k.us> writes: > >> > 3. I waited for the data to replicate, which didn't happen. >> >> Correct, you need to run nodetool repair because the nodes were not >> present when the writes came in.  You can also use a higher >> c

Re: Data Distribution / Replication

2010-08-13 Thread Oleg Anastasjev
Benjamin Black b3k.us> writes: > > 3. I waited for the data to replicate, which didn't happen. > > Correct, you need to run nodetool repair because the nodes were not > present when the writes came in. You can also use a higher > consistency level to force read repair before returning data, whi

Re: Data Distribution / Replication

2010-08-12 Thread Benjamin Black
On Thu, Aug 12, 2010 at 8:30 AM, Stefan Kaufmann wrote: > Hello again, > > last day's I started several tests with Cassandra and learned quite some > facts. > > However, of course, there are still enough things I need to > understand. One thing is, how the data replication works. > For my Testing