Re: Full repair results in uneven data distribution

2021-03-16 Thread Bowen Song
That sounds like the combined results from the anti-compaction and the size amplification from the default SizeTieredCompactionStrategy. If you keep repeating those steps, the disk usage will eventually stop growing. Of course, that's not an excuse to keep repeating it. To fix this (if you rea

Full repair results in uneven data distribution

2021-03-16 Thread Inquistive allen
Hello Team, Sorry for this might be a simple question. I was working on Cassandra 2.1.14 Node1 -- 4.5 mb data Node2 -- 5.3 mb data Node3 -- 4.9 mb data Node3 was down since 90 days. I brought it up and it joined the cluster. To sync data I ran nodetool repair --full Repair was successful...how

Re: Data Distribution in Table/Column Family

2015-08-27 Thread Jack Krupansky
Even if the data were absolutely evenly distributed, that won't guarantee that the hash values of the partition keys used in your client queries won't collide a result in a hotspot. Another possibility is that your data is not partitioned well at the primary key level. Are you using clustering key

Re: Data Distribution in Table/Column Family

2015-08-27 Thread Alain RODRIGUEZ
Hi, Did you try to run the following on all your nodes and compare ? du -sh /*whatever*/cassandra/data/* Of course if you have unequal snapshots sizes remove them in the above command (or directly remove them). This should answer (barely) your question about an eventual even distribution (/!\ h

Data Distribution in Table/Column Family

2015-08-27 Thread Saladi Naidu
Is there a way to find out how data is distributed within column family by each node? Nodetool provides how data is distributed across nodes that only shows all the data by node. We are seeing heavy load on one node and I suspect that partitioning is not distributing data equally. But to prove t

Re: data distribution along column family partitions

2015-02-04 Thread Marcelo Valle (BLOOMBERG/ LONDON)
From: clohfin...@gmail.com Subject: Re: data distribution along column family partitions > not ok :) don't let a single partition get to 1gb, 100's of mb should be when > flares are going up. The main reasoning is compactions would be horrifically > slow and there will

Re: data distribution along column family partitions

2015-02-04 Thread Chris Lohfink
page the query > across multiple partitions if Y-X > bucket size. > > If I use paging, Cassandra won't try to allocate the whole partition on > the server node, it will just allocate memory in the heap for that page. > Check? > > Marcelo Valle > > From: user@c

Re: data distribution along column family partitions

2015-02-04 Thread Marcelo Valle (BLOOMBERG/ LONDON)
x27;t try to allocate the whole partition on the server node, it will just allocate memory in the heap for that page. Check? Marcelo Valle From: user@cassandra.apache.org Subject: Re: data distribution along column family partitions The data model lgtm. You may need to balance the size of the ti

Re: data distribution along column family partitions

2015-02-04 Thread Chris Lohfink
The data model lgtm. You may need to balance the size of the time buckets with the amount of alarms to prevent partitions from getting too large. 1 month may be a little large, I would aim to keep the partitions below 25mb (can check with nodetool cfstats) or so in size to keep everything happy.

data distribution along column family partitions

2015-02-04 Thread Marcelo Elias Del Valle
Hello, I am designing a model to store alerts users receive over time. I will want to store probably the last two years of alerts for each user. The first thought I had was having a column family partitioned by user + timebucket, where time bucket could be something like year + month. For instanc

Re: A good key for data distribution over nodes

2011-10-10 Thread David McNelis
You should be ok, depending on the partitioner strategy you use. The keys end up created as a hash (which is why when you're setting up your nodes you can give them a specific key. Then, whatever your key is will be used to create an MD5 hash, that hash will then determine what node your data wil

A good key for data distribution over nodes

2011-10-10 Thread Laurent Aufrechter
Hi, I am planing to make tests on Cassandra with a few nodes. I want to create a column family where the key will be the date down to the second (like 2011/10/10-16:07:53). Doing so, my keys will be very similar from each others. Is it ok to use such keys if I want my data to be evenly distribu

Re: Data distribution

2011-02-15 Thread mcasandra
? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-distribution-tp6025869p6030157.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: Data distribution

2011-02-15 Thread Robert Coli
On Tue, Feb 15, 2011 at 3:05 PM, mcasandra wrote: > > Is there a way to let the new node join cluster in the background and make it > live to clients only after it has finished with node repair, syncing data > etc. and in the end sync keys or trees that's needed before it's come to > life. I know

Re: Data distribution

2011-02-15 Thread mcasandra
e-org.3065146.n2.nabble.com/Data-distribution-tp6025869p6029882.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: Data distribution

2011-02-15 Thread Matthew Dennis
n context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-distribution-tp6025869p6029708.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com. >

Re: Data distribution

2011-02-15 Thread mcasandra
soon as it steals the keys. This ways we know we are adding nodes only when we think it's all ready. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-distribution-tp6025869p6029708.html Sent from the cassandra-u...@incubator.apache.org maili

Re: Data distribution

2011-02-14 Thread Matthew Dennis
to run after increasing the replication factor. > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-distribution-tp6025869p6025972.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com. >

RE: Data distribution

2011-02-14 Thread mcasandra
://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-distribution-tp6025869p6025972.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: Data distribution

2011-02-14 Thread Matthew Dennis
On Mon, Feb 14, 2011 at 6:58 PM, Dan Hendry wrote: > > 1) If I insert a key and want to verify which node it went to then how do > I > > do that? > > I don't think you can and there should be no reason to care. Cassandra > abstracts where data is being stored, think in terms of consistency levels

RE: Data distribution

2011-02-14 Thread Dan Hendry
riginal Message- From: mcasandra [mailto:mohitanch...@gmail.com] Sent: February-14-11 19:45 To: cassandra-u...@incubator.apache.org Subject: Data distribution Couple of questions: 1) If I insert a key and want to verify which node it went to then how do I do that? 2) How can I verify if the replication is

Data distribution

2011-02-14 Thread mcasandra
and change the replication factor say 2 to 3. Would cassandra automatically replicate the old data to the 3rd node? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-distribution-tp6025869p6025869.html Sent from the cassandra-u

Re: Data Distribution / Replication

2010-08-14 Thread Benjamin Black
#546 #1076 #1169 #1377 etc... On Sat, Aug 14, 2010 at 12:05 PM, Bill de hÓra wrote: > That data suggests the inbuilt tools are a hazard and manual workarounds > less so. > > Can you point me at the bugs? > > Bill > > > On Fri, 2010-08-13 at 20:30 -0700, Benjamin Black wrote: >> Number of bugs I'

Re: Data Distribution / Replication

2010-08-14 Thread Benjamin Black
On Fri, Aug 13, 2010 at 10:13 PM, Stefan Kaufmann wrote: >> My recommendation is to leave Autobootstrap disabled, copy the >> datafiles over, and then run cleanup.  It is faster and more reliable >> than streaming, in my experience. > > I thought about copying da Data manually. However if I have a

Re: Data Distribution / Replication

2010-08-14 Thread Bill de hÓra
That data suggests the inbuilt tools are a hazard and manual workarounds less so. Can you point me at the bugs? Bill On Fri, 2010-08-13 at 20:30 -0700, Benjamin Black wrote: > Number of bugs I've hit doing this with scp: 0 > Number of bugs I've hit with streaming: 2 (and others found more) >

Re: Data Distribution / Replication

2010-08-13 Thread Stefan Kaufmann
> My recommendation is to leave Autobootstrap disabled, copy the > datafiles over, and then run cleanup.  It is faster and more reliable > than streaming, in my experience. I thought about copying da Data manually. However if I have a running environment and add a node (or replace a broken one), h

Re: Data Distribution / Replication

2010-08-13 Thread Benjamin Black
Number of bugs I've hit doing this with scp: 0 Number of bugs I've hit with streaming: 2 (and others found more) Also easier to monitor progress, manage bandwidth, etc. I just prefer using specialized tools that are really good at specific things. This is such a case. b On Fri, Aug 13, 2010 at

Re: Data Distribution / Replication

2010-08-13 Thread Bill de hÓra
On Fri, 2010-08-13 at 09:51 -0700, Benjamin Black wrote: > My recommendation is to leave Autobootstrap disabled, copy the > datafiles over, and then run cleanup. It is faster and more reliable > than streaming, in my experience. What is less reliable about streaming? Bill

Re: Data Distribution / Replication

2010-08-13 Thread Benjamin Black
On Fri, Aug 13, 2010 at 9:48 AM, Oleg Anastasjev wrote: > Benjamin Black b3k.us> writes: > >> > 3. I waited for the data to replicate, which didn't happen. >> >> Correct, you need to run nodetool repair because the nodes were not >> present when the writes came in.  You can also use a higher >> c

Re: Data Distribution / Replication

2010-08-13 Thread Oleg Anastasjev
Benjamin Black b3k.us> writes: > > 3. I waited for the data to replicate, which didn't happen. > > Correct, you need to run nodetool repair because the nodes were not > present when the writes came in. You can also use a higher > consistency level to force read repair before returning data, whi

Re: Data Distribution / Replication

2010-08-12 Thread Benjamin Black
On Thu, Aug 12, 2010 at 8:30 AM, Stefan Kaufmann wrote: > Hello again, > > last day's I started several tests with Cassandra and learned quite some > facts. > > However, of course, there are still enough things I need to > understand. One thing is, how the data replication works. > For my Testing

Data Distribution / Replication

2010-08-12 Thread Stefan Kaufmann
Hello again, last day's I started several tests with Cassandra and learned quite some facts. However, of course, there are still enough things I need to understand. One thing is, how the data replication works. For my Testing: 1. I set the replication Factor to 3, started with 1 active node (the

Re: Client connection and data distribution across nodes

2010-06-16 Thread Ran Tavory
On Thu, Jun 17, 2010 at 8:52 AM, Mubarak Seyed wrote: > Hi All, > > Regarding client thrift connection, i have 4 nodes which formed a ring, but > client only knows the IP address of an one node (and thrift RPC port > number), > how does client can connect to any one other node without getting rin

Client connection and data distribution across nodes

2010-06-16 Thread Mubarak Seyed
Hi All, Regarding client thrift connection, i have 4 nodes which formed a ring, but client only knows the IP address of an one node (and thrift RPC port number), how does client can connect to any one other node without getting ring information? Can we keep the load balancer and bind all the fo