Hi, I'm still curious if I got the data movement right in this email from before? Anyone? Also, anyone know if I can scp the data directory from a node I want to replace to a new machine? The cassandra streaming seems much slower than scp.
-Anthony On Mon, Apr 19, 2010 at 04:48:23PM -0700, Anthony Molinaro wrote: > > On Mon, Apr 19, 2010 at 03:28:26PM -0500, Jonathan Ellis wrote: > > > Can I then 'nodeprobe move <token for range I want to take over>', and > > > achieve the same as step 2 above? > > > > You can't have two nodes with the same token in the ring at once. So, > > you can removetoken the old node first, then bootstrap the new one > > (just specify InitialToken in the config to avoid having it guess > > one), or you can make it a 3 step process (bootstrap, remove, move) to > > avoid transferring so much data around. > > So I'm still a little fuzzy for your 3 step case on why less data moves, > but let me run through the two scenarios and see where we get. Please > correct me if I'm wrong on some point. > > Let say I have 3 nodes with random partitioner and rack unaware strategy. > Which means I have something like > > Node Size Token KeyRange (self + next in ring) > ---- ---- ----- ------------------------------ > A 5 G 33 1 -> 66 > B 6 G 66 34 -> 0 > C 2 G 0 67 -> 33 > > Now lets say Node B is giving us some problems, so we want to replace it > with another node D. > > We've outlined 2 processes. > > In the first process you recommend > > 1. removetoken on node B > 2. wait for data to move > 3. add InitialToken of 66 and AutoBootstrap = true to node D storage-conf.xml > then start it > 4. wait for data to move > > So when you do the removetoken, this will cause the following transfers > at stage 2 > Node A sends 34->66 to Node C > Node C sends 67->0 to Node A > at stage 4 > Node A sends 34->66 to Node D > Node C sends 67->0 to Node D > > In the second process I assume you pick a token really close to another token? > > 1. add InitialToken of 34 and AutoBootstrap to true to node D storage-conf.xml > then start it > 2. wait for data to move > 3. removetoken on node B > 4. wait for data to move > 5. movetoken on node D to 66 > 6. wait for data to move > > This results in the following moves > at stage 2 > Node A/B sends 33->34 to Node D (primary token range) > Node B sends 34->66 to Node D (replica range) > at stage 4 > Node C sends 66->0 to Node D (replica range) > at stage 6 > No data movement as D already had 33->0 > > So seems like you move all the data twice for process 1 and only a small > portion twice for process 2 (which is what you said, so hopefully I've > outlined correctly what is happening). Does all that sound right? > > Once I've run bootstrap with the InitialToken value set in the config is > it then ignored in subsequent restarts, and if so can I just remove it > after that first time? > > Thanks, > > -Anthony > > -- > ------------------------------------------------------------------------ > Anthony Molinaro <antho...@alumni.caltech.edu> -- ------------------------------------------------------------------------ Anthony Molinaro <antho...@alumni.caltech.edu>