Re: Deployment on AWS and replication strategies

Mike Gallamore Sun, 04 Apr 2010 09:14:46 -0700

Pluggable placement: that is cool. It wasn't something that was obvious to me 
that was available from the documentation I read.  I thought maybe the the 
rackaware and rackunaware were hard coded in somewhere. I'm not a java 
developer so I haven't looked at the code much. That said I'll take a look and 
see if I can figure out how it works. I have coded in C/C++ so I probably can 
handle the logic part of Java code okay.
On 2010-04-04, at 1:18 AM, Benjamin Black wrote:


> On Sat, Apr 3, 2010 at 8:23 PM, Mike Gallamore
> <mike.e.gallam...@googlemail.com> wrote:
>>> 
>> I didn't mean a real time determination, more of if the nodes aren't 
>> identical. For example if you have a cluster made up of a bunch of EC2 light 
>> instances and decide to add a large instance, it would be nice if the new 
>> node would get a proportional amount of work based on what its system specs 
>> are.
> 
> Sure, set the token(s) appropriately.
> 
>>> 
>>>> perhaps a preferred hash range not just a token (and presumably everything 
>>>> else would automatically rebalance itself to make that happen)
>>>> 
>>> 
>>> Unclear what this would do.
>> Well rather than getting half of the most busy nodes work (which is how I 
>> understand it works now) you'd get an amount of work that is proportional to 
>> the power of the node.
> 
> Assuming you allow it to automatically assign its own token, the new
> node will get have the range of the node with the most data, not the
> most 'busy'.  The amount of work being done by the nodes is not a
> consideration, nor would you want automatic selection of that within
> cassandra except with significant support for long term trend
> collection and analysis, pluggable policies for calculating 'load',
> etc.
> 
>>> 
>>> Or just set the token specifically for each node you bootstrap.
>>> Starting a node and crossing your fingers on its token selection is a
>>> recipe for interesting times :)
>> Can you specify a token based on a real key value? How do you know what 
>> token to use to make sure that locally relevant data gets at least one copy 
>> stored locally?
> 
> Again, placement strategy is what you want to investigate.
> 
>> My understanding is rackawarestrategy puts the data on the next node in the 
>> token ring that is in a different datacenter. The problem is if you want a 
>> specific "other datacenter" not just the next one in the list.
> 
> Right, I suggested looking at the source as an example.  If you want a
> more sophisticated placement policy, write one.  They are not
> complicated and you will have a much deeper understanding of the
> mechanism.  IMO, pluggable placement is a remarkable feature.
> 
> 
> b

Re: Deployment on AWS and replication strategies

Reply via email to