That sounds like the combined results from the anti-compaction and the
size amplification from the default SizeTieredCompactionStrategy. If you
keep repeating those steps, the disk usage will eventually stop growing.
Of course, that's not an excuse to keep repeating it.
To fix this (if you rea
Hello Team,
Sorry for this might be a simple question.
I was working on Cassandra 2.1.14
Node1 -- 4.5 mb data
Node2 -- 5.3 mb data
Node3 -- 4.9 mb data
Node3 was down since 90 days.
I brought it up and it joined the cluster.
To sync data I ran nodetool repair --full
Repair was successful...how
Even if the data were absolutely evenly distributed, that won't guarantee
that the hash values of the partition keys used in your client queries
won't collide a result in a hotspot.
Another possibility is that your data is not partitioned well at the
primary key level. Are you using clustering key
Hi,
Did you try to run the following on all your nodes and compare ?
du -sh /*whatever*/cassandra/data/*
Of course if you have unequal snapshots sizes remove them in the above
command (or directly remove them).
This should answer (barely) your question about an eventual even
distribution (/!\ h
Is there a way to find out how data is distributed within column family by each
node? Nodetool provides how data is distributed across nodes that only shows
all the data by node. We are seeing heavy load on one node and I suspect that
partitioning is not distributing data equally. But to prove t
From: clohfin...@gmail.com
Subject: Re: data distribution along column family partitions
> not ok :) don't let a single partition get to 1gb, 100's of mb should be when
> flares are going up. The main reasoning is compactions would be horrifically
> slow and there will
page the query
> across multiple partitions if Y-X > bucket size.
>
> If I use paging, Cassandra won't try to allocate the whole partition on
> the server node, it will just allocate memory in the heap for that page.
> Check?
>
> Marcelo Valle
>
> From: user@c
x27;t try to allocate the whole partition on the
server node, it will just allocate memory in the heap for that page. Check?
Marcelo Valle
From: user@cassandra.apache.org
Subject: Re: data distribution along column family partitions
The data model lgtm. You may need to balance the size of the ti
The data model lgtm. You may need to balance the size of the time buckets
with the amount of alarms to prevent partitions from getting too large. 1
month may be a little large, I would aim to keep the partitions below 25mb
(can check with nodetool cfstats) or so in size to keep everything happy.
Hello,
I am designing a model to store alerts users receive over time. I will want
to store probably the last two years of alerts for each user.
The first thought I had was having a column family partitioned by user +
timebucket, where time bucket could be something like year + month. For
instanc
You should be ok, depending on the partitioner strategy you use. The keys
end up created as a hash (which is why when you're setting up your nodes you
can give them a specific key. Then, whatever your key is will be used to
create an MD5 hash, that hash will then determine what node your data wil
Hi,
I am planing to make tests on Cassandra with a few nodes. I want to create a
column family where the key will be the date down to the second (like
2011/10/10-16:07:53). Doing so, my keys will be very similar from each others.
Is it ok to use such keys if I want my data to be evenly distribu
?
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-distribution-tp6025869p6030157.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.
On Tue, Feb 15, 2011 at 3:05 PM, mcasandra wrote:
>
> Is there a way to let the new node join cluster in the background and make it
> live to clients only after it has finished with node repair, syncing data
> etc. and in the end sync keys or trees that's needed before it's come to
> life. I know
e-org.3065146.n2.nabble.com/Data-distribution-tp6025869p6029882.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.
n context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-distribution-tp6025869p6029708.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>
soon as it steals
the keys.
This ways we know we are adding nodes only when we think it's all ready.
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-distribution-tp6025869p6029708.html
Sent from the cassandra-u...@incubator.apache.org maili
to run after increasing the replication factor.
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-distribution-tp6025869p6025972.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>
://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-distribution-tp6025869p6025972.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.
On Mon, Feb 14, 2011 at 6:58 PM, Dan Hendry wrote:
> > 1) If I insert a key and want to verify which node it went to then how do
> I
> > do that?
>
> I don't think you can and there should be no reason to care. Cassandra
> abstracts where data is being stored, think in terms of consistency levels
riginal Message-
From: mcasandra [mailto:mohitanch...@gmail.com]
Sent: February-14-11 19:45
To: cassandra-u...@incubator.apache.org
Subject: Data distribution
Couple of questions:
1) If I insert a key and want to verify which node it went to then how do I
do that?
2) How can I verify if the replication is
and change the replication
factor say 2 to 3. Would cassandra automatically replicate the old data to
the 3rd node?
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-distribution-tp6025869p6025869.html
Sent from the cassandra-u
#546
#1076
#1169
#1377
etc...
On Sat, Aug 14, 2010 at 12:05 PM, Bill de hÓra wrote:
> That data suggests the inbuilt tools are a hazard and manual workarounds
> less so.
>
> Can you point me at the bugs?
>
> Bill
>
>
> On Fri, 2010-08-13 at 20:30 -0700, Benjamin Black wrote:
>> Number of bugs I'
On Fri, Aug 13, 2010 at 10:13 PM, Stefan Kaufmann wrote:
>> My recommendation is to leave Autobootstrap disabled, copy the
>> datafiles over, and then run cleanup. It is faster and more reliable
>> than streaming, in my experience.
>
> I thought about copying da Data manually. However if I have a
That data suggests the inbuilt tools are a hazard and manual workarounds
less so.
Can you point me at the bugs?
Bill
On Fri, 2010-08-13 at 20:30 -0700, Benjamin Black wrote:
> Number of bugs I've hit doing this with scp: 0
> Number of bugs I've hit with streaming: 2 (and others found more)
>
> My recommendation is to leave Autobootstrap disabled, copy the
> datafiles over, and then run cleanup. It is faster and more reliable
> than streaming, in my experience.
I thought about copying da Data manually. However if I have a running
environment
and add a node (or replace a broken one), h
Number of bugs I've hit doing this with scp: 0
Number of bugs I've hit with streaming: 2 (and others found more)
Also easier to monitor progress, manage bandwidth, etc. I just prefer
using specialized tools that are really good at specific things. This
is such a case.
b
On Fri, Aug 13, 2010 at
On Fri, 2010-08-13 at 09:51 -0700, Benjamin Black wrote:
> My recommendation is to leave Autobootstrap disabled, copy the
> datafiles over, and then run cleanup. It is faster and more reliable
> than streaming, in my experience.
What is less reliable about streaming?
Bill
On Fri, Aug 13, 2010 at 9:48 AM, Oleg Anastasjev wrote:
> Benjamin Black b3k.us> writes:
>
>> > 3. I waited for the data to replicate, which didn't happen.
>>
>> Correct, you need to run nodetool repair because the nodes were not
>> present when the writes came in. You can also use a higher
>> c
Benjamin Black b3k.us> writes:
> > 3. I waited for the data to replicate, which didn't happen.
>
> Correct, you need to run nodetool repair because the nodes were not
> present when the writes came in. You can also use a higher
> consistency level to force read repair before returning data, whi
On Thu, Aug 12, 2010 at 8:30 AM, Stefan Kaufmann wrote:
> Hello again,
>
> last day's I started several tests with Cassandra and learned quite some
> facts.
>
> However, of course, there are still enough things I need to
> understand. One thing is, how the data replication works.
> For my Testing
Hello again,
last day's I started several tests with Cassandra and learned quite some facts.
However, of course, there are still enough things I need to
understand. One thing is, how the data replication works.
For my Testing:
1. I set the replication Factor to 3, started with 1 active node (the
On Thu, Jun 17, 2010 at 8:52 AM, Mubarak Seyed wrote:
> Hi All,
>
> Regarding client thrift connection, i have 4 nodes which formed a ring, but
> client only knows the IP address of an one node (and thrift RPC port
> number),
> how does client can connect to any one other node without getting rin
Hi All,
Regarding client thrift connection, i have 4 nodes which formed a ring, but
client only knows the IP address of an one node (and thrift RPC port number),
how does client can connect to any one other node without getting ring
information? Can we keep the load balancer and bind all the fo
34 matches
Mail list logo