Even if the data were absolutely evenly distributed, that won't guarantee
that the hash values of the partition keys used in your client queries
won't collide a result in a hotspot.
Another possibility is that your data is not partitioned well at the
primary key level. Are you using clustering key
Hi,
Did you try to run the following on all your nodes and compare ?
du -sh /*whatever*/cassandra/data/*
Of course if you have unequal snapshots sizes remove them in the above
command (or directly remove them).
This should answer (barely) your question about an eventual even
distribution (/!\ h
From: clohfin...@gmail.com
Subject: Re: data distribution along column family partitions
> not ok :) don't let a single partition get to 1gb, 100's of mb should be when
> flares are going up. The main reasoning is compactions would be horrifically
> slow and there will
page the query
> across multiple partitions if Y-X > bucket size.
>
> If I use paging, Cassandra won't try to allocate the whole partition on
> the server node, it will just allocate memory in the heap for that page.
> Check?
>
> Marcelo Valle
>
> From: user@c
x27;t try to allocate the whole partition on the
server node, it will just allocate memory in the heap for that page. Check?
Marcelo Valle
From: user@cassandra.apache.org
Subject: Re: data distribution along column family partitions
The data model lgtm. You may need to balance the size of the ti
The data model lgtm. You may need to balance the size of the time buckets
with the amount of alarms to prevent partitions from getting too large. 1
month may be a little large, I would aim to keep the partitions below 25mb
(can check with nodetool cfstats) or so in size to keep everything happy.
HH is one aspect and the other aspect is when new node join there need to be
some balancing that need to occur, this may take time as well.
But I also understand it will add lot of complexity in the code.
Is there any place where I can read other things of concern that one should
be aware of?
--
On Tue, Feb 15, 2011 at 3:05 PM, mcasandra wrote:
>
> Is there a way to let the new node join cluster in the background and make it
> live to clients only after it has finished with node repair, syncing data
> etc. and in the end sync keys or trees that's needed before it's come to
> life. I know
Thanks! Would Hector take care of not load balancing to the new node until
it's ready?
Also, when repair is occuring in background is there a status that I can
look at to see that repair is occuring for key ABC.
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146
Assuming you aren't changing the RC, the normal bootstrap process takes care
of all the problems like that, making sure things work correctly.
Most importantly, if something fails (either the new node or any of the
existing nodes) you can recover from it.
Just don't connect clients directly to th
Is there a way to let the new node join cluster in the background and make it
live to clients only after it has finished with node repair, syncing data
etc. and in the end sync keys or trees that's needed before it's come to
life. I know it can be tricky since it needs to be live as soon as it ste
regardless of increasing RF or not, RR happens based on the
read_repair_chance setting. RR happens after the request has been replied
to though, so it's possible that if you increase the RF and then read that
the read might get stale/missing data. RR would then put the correct value
on all the co
When I increase the replication factor does the repair happen automatically
in background when client first tries to access data from the node where
data does not exist.
Or the nodetool repair need to run after increasing the replication factor.
--
View this message in context:
http://cassandra
On Mon, Feb 14, 2011 at 6:58 PM, Dan Hendry wrote:
> > 1) If I insert a key and want to verify which node it went to then how do
> I
> > do that?
>
> I don't think you can and there should be no reason to care. Cassandra
> abstracts where data is being stored, think in terms of consistency levels
> 1) If I insert a key and want to verify which node it went to then how do
I
> do that?
I don't think you can and there should be no reason to care. Cassandra
abstracts where data is being stored, think in terms of consistency levels
not actual nodes.
> 2) How can I verify if the replication is
#546
#1076
#1169
#1377
etc...
On Sat, Aug 14, 2010 at 12:05 PM, Bill de hÓra wrote:
> That data suggests the inbuilt tools are a hazard and manual workarounds
> less so.
>
> Can you point me at the bugs?
>
> Bill
>
>
> On Fri, 2010-08-13 at 20:30 -0700, Benjamin Black wrote:
>> Number of bugs I'
On Fri, Aug 13, 2010 at 10:13 PM, Stefan Kaufmann wrote:
>> My recommendation is to leave Autobootstrap disabled, copy the
>> datafiles over, and then run cleanup. It is faster and more reliable
>> than streaming, in my experience.
>
> I thought about copying da Data manually. However if I have a
That data suggests the inbuilt tools are a hazard and manual workarounds
less so.
Can you point me at the bugs?
Bill
On Fri, 2010-08-13 at 20:30 -0700, Benjamin Black wrote:
> Number of bugs I've hit doing this with scp: 0
> Number of bugs I've hit with streaming: 2 (and others found more)
>
> My recommendation is to leave Autobootstrap disabled, copy the
> datafiles over, and then run cleanup. It is faster and more reliable
> than streaming, in my experience.
I thought about copying da Data manually. However if I have a running
environment
and add a node (or replace a broken one), h
Number of bugs I've hit doing this with scp: 0
Number of bugs I've hit with streaming: 2 (and others found more)
Also easier to monitor progress, manage bandwidth, etc. I just prefer
using specialized tools that are really good at specific things. This
is such a case.
b
On Fri, Aug 13, 2010 at
On Fri, 2010-08-13 at 09:51 -0700, Benjamin Black wrote:
> My recommendation is to leave Autobootstrap disabled, copy the
> datafiles over, and then run cleanup. It is faster and more reliable
> than streaming, in my experience.
What is less reliable about streaming?
Bill
On Fri, Aug 13, 2010 at 9:48 AM, Oleg Anastasjev wrote:
> Benjamin Black b3k.us> writes:
>
>> > 3. I waited for the data to replicate, which didn't happen.
>>
>> Correct, you need to run nodetool repair because the nodes were not
>> present when the writes came in. You can also use a higher
>> c
Benjamin Black b3k.us> writes:
> > 3. I waited for the data to replicate, which didn't happen.
>
> Correct, you need to run nodetool repair because the nodes were not
> present when the writes came in. You can also use a higher
> consistency level to force read repair before returning data, whi
On Thu, Aug 12, 2010 at 8:30 AM, Stefan Kaufmann wrote:
> Hello again,
>
> last day's I started several tests with Cassandra and learned quite some
> facts.
>
> However, of course, there are still enough things I need to
> understand. One thing is, how the data replication works.
> For my Testing
24 matches
Mail list logo