Pluggable placement: that is cool. It wasn't something that was obvious to me that was available from the documentation I read. I thought maybe the the rackaware and rackunaware were hard coded in somewhere. I'm not a java developer so I haven't looked at the code much. That said I'll take a look and see if I can figure out how it works. I have coded in C/C++ so I probably can handle the logic part of Java code okay. On 2010-04-04, at 1:18 AM, Benjamin Black wrote:
> On Sat, Apr 3, 2010 at 8:23 PM, Mike Gallamore > <mike.e.gallam...@googlemail.com> wrote: >>> >> I didn't mean a real time determination, more of if the nodes aren't >> identical. For example if you have a cluster made up of a bunch of EC2 light >> instances and decide to add a large instance, it would be nice if the new >> node would get a proportional amount of work based on what its system specs >> are. > > Sure, set the token(s) appropriately. > >>> >>>> perhaps a preferred hash range not just a token (and presumably everything >>>> else would automatically rebalance itself to make that happen) >>>> >>> >>> Unclear what this would do. >> Well rather than getting half of the most busy nodes work (which is how I >> understand it works now) you'd get an amount of work that is proportional to >> the power of the node. > > Assuming you allow it to automatically assign its own token, the new > node will get have the range of the node with the most data, not the > most 'busy'. The amount of work being done by the nodes is not a > consideration, nor would you want automatic selection of that within > cassandra except with significant support for long term trend > collection and analysis, pluggable policies for calculating 'load', > etc. > >>> >>> Or just set the token specifically for each node you bootstrap. >>> Starting a node and crossing your fingers on its token selection is a >>> recipe for interesting times :) >> Can you specify a token based on a real key value? How do you know what >> token to use to make sure that locally relevant data gets at least one copy >> stored locally? > > Again, placement strategy is what you want to investigate. > >> My understanding is rackawarestrategy puts the data on the next node in the >> token ring that is in a different datacenter. The problem is if you want a >> specific "other datacenter" not just the next one in the list. > > Right, I suggested looking at the source as an example. If you want a > more sophisticated placement policy, write one. They are not > complicated and you will have a much deeper understanding of the > mechanism. IMO, pluggable placement is a remarkable feature. > > > b