Re: Working backwards from production to staging/dev

Edward Capriolo Sat, 26 Mar 2011 07:25:29 -0700

On Fri, Mar 25, 2011 at 2:11 PM, ian douglas <i...@armorgames.com> wrote:
> On 03/25/2011 10:12 AM, Jonathan Ellis wrote:
>>
>> On Fri, Mar 25, 2011 at 11:59 AM, ian douglas<i...@armorgames.com>  wrote:
>>>
>>> (we're running v0.60)
>>
>> I don't know if you could hear that from where you are, but our whole
>> office just yelled, "WTF!" :)
>
> Ah, that's what that noise was... And yeah, we know we're way behind. Our
> initial delay in upgrading was waiting for 0.7 to come out and then we
> learned we needed a whole new Thrift client for our PHP code base, and then
> we got busy on other things, but we're at a point where we have some time to
> take care of Cassandra and get it upgraded.
>
>  Our planned path, now, is:
>
> (our nodes' tokens are numbered using the python code (0, 1/3 and 2/3 times
> 2^127), and called node 1 through 3, respectively; our RF is set to 2 right
> now)
>
> 1. remove node 1 from our software
> 2. bring node 1 offline after a flush/repair/cleanup
> 3. run a cleanup on node 2 and then on node 3 so they have a full copy of
> all data from the old node 1 and each other.
> 4. bring up a new Large 64-bit instance, install 0.6.12, assign a Token
> value of 0 (node 1), RF:2, on a new gossip ring, and copy all data from the
> 32-bit nodes 2 and 3 and run a repair/cleanup to remove any duplicated data
> 5. remove node 3 from our software
> 6. point our code to the new 64-bit node 1
> 7. bring node 3 offline after a flush/repair/cleanup so node 2 has the last
> fresh copy of everything
> 8. bring node 2 offline after a flush/repair/cleanup
> 9. bring up another Large instance, get a copy of all data from our old node
> 2, assign a Token value of (1/2 * 2^127), RF:2, on the new gossip ring, run
> a repair to remove duplicate data, and then a cleanup so it gets replicated
> data from the new node 1
> 10. add the new node 2 to our software
> 11. run a final cleanup on the new node 1 and then on node 2 to make sure
> all data is replicated evenly on both nodes
>
> ... at this point, we should have two 64-bit Large instances, with RF:2, on
> a new gossip ring, replacing three 32-bit systems, with minimal down time
> and no data loss (just a data delay between steps 6 and 10 above).
>
> Questions:
> 1. Does it appear that we've missed any steps, or doing something out of
> order?
> 2. Is the flush/repair/cleanup overkill when bringing the old nodes offline,
> or is that the correct sequence to follow?
> 3. Will the difference in compute units (lower on Large instances than
> Medium instances) make any noticeable difference, or will the fact that the
> machine is 64-bit handle things efficiently enough such that a Large
> instance works harder than a Medium instance? (never did figure out their
> how their compute units work)
> 4. Can we follow similar steps when we're ready to upgrade to 0.7x and have
> our new Thrift client for PHP all squared away?
>
>
> Thanks again for the help!!!
>
>


If you have a node with an old column family you are not using
anymore...Stop node...delete data...start node.

Edward

Re: Working backwards from production to staging/dev

Reply via email to