On Fri, Mar 25, 2011 at 2:11 PM, ian douglas <i...@armorgames.com> wrote: > On 03/25/2011 10:12 AM, Jonathan Ellis wrote: >> >> On Fri, Mar 25, 2011 at 11:59 AM, ian douglas<i...@armorgames.com> wrote: >>> >>> (we're running v0.60) >> >> I don't know if you could hear that from where you are, but our whole >> office just yelled, "WTF!" :) > > Ah, that's what that noise was... And yeah, we know we're way behind. Our > initial delay in upgrading was waiting for 0.7 to come out and then we > learned we needed a whole new Thrift client for our PHP code base, and then > we got busy on other things, but we're at a point where we have some time to > take care of Cassandra and get it upgraded. > > Our planned path, now, is: > > (our nodes' tokens are numbered using the python code (0, 1/3 and 2/3 times > 2^127), and called node 1 through 3, respectively; our RF is set to 2 right > now) > > 1. remove node 1 from our software > 2. bring node 1 offline after a flush/repair/cleanup > 3. run a cleanup on node 2 and then on node 3 so they have a full copy of > all data from the old node 1 and each other. > 4. bring up a new Large 64-bit instance, install 0.6.12, assign a Token > value of 0 (node 1), RF:2, on a new gossip ring, and copy all data from the > 32-bit nodes 2 and 3 and run a repair/cleanup to remove any duplicated data > 5. remove node 3 from our software > 6. point our code to the new 64-bit node 1 > 7. bring node 3 offline after a flush/repair/cleanup so node 2 has the last > fresh copy of everything > 8. bring node 2 offline after a flush/repair/cleanup > 9. bring up another Large instance, get a copy of all data from our old node > 2, assign a Token value of (1/2 * 2^127), RF:2, on the new gossip ring, run > a repair to remove duplicate data, and then a cleanup so it gets replicated > data from the new node 1 > 10. add the new node 2 to our software > 11. run a final cleanup on the new node 1 and then on node 2 to make sure > all data is replicated evenly on both nodes > > ... at this point, we should have two 64-bit Large instances, with RF:2, on > a new gossip ring, replacing three 32-bit systems, with minimal down time > and no data loss (just a data delay between steps 6 and 10 above). > > Questions: > 1. Does it appear that we've missed any steps, or doing something out of > order? > 2. Is the flush/repair/cleanup overkill when bringing the old nodes offline, > or is that the correct sequence to follow? > 3. Will the difference in compute units (lower on Large instances than > Medium instances) make any noticeable difference, or will the fact that the > machine is 64-bit handle things efficiently enough such that a Large > instance works harder than a Medium instance? (never did figure out their > how their compute units work) > 4. Can we follow similar steps when we're ready to upgrade to 0.7x and have > our new Thrift client for PHP all squared away? > > > Thanks again for the help!!! > >
If you have a node with an old column family you are not using anymore...Stop node...delete data...start node. Edward