Re: URGENT HELP PLEASE!

Sylvain Lebresne Fri, 25 Mar 2011 13:16:31 -0700

> Although after all the help from the Cassandra community I have a much better 
> understanding of why and how my situation happened, there was still one 
> strange side effect I noticed. For context, I store user accounts and other 
> account information in Cassandra. When the second node was offline and I 
> tried to log into the site, I got an error saying invalid password. Out of 
> curiosity I logged into the cassandra-cli tool and looked at what columns and 
> values were present for my user account. My User CF seemed to have data 
> stored from right before I added the second node. I found that really strange 
> assuming that Cassandra doesn't keep any historical or versioned data? Again, 
> once the second node was back online both servers showed the expected more 
> current data.

What happened is this:
You started your cluster with only one node, so at first, all data was on this.
Then you added a second node. Cassandra then moved (approximatively)
half of the data to the second node. In theory, at that
point the data that was moved to the second node could be removed from
the first node (since you had RF=1). However, Cassandra
don't do that removing part automatically for safety reasons. You'll
have to run cleanup on the first node for that to happen.
So there was stale data on the first node, that never got updated
because the first node was not responsible anymore for that data.
It was garbage that just didn't get removed. What you should have done
is run nodetool cleanup on the first node after having bootstrapped
the second one and checked everything was fine.

>
> Today I'm preparing to increase my replication factor to 2 and have been 
> reading about the proper way to do that. Although I've found bits and pieces, 
> I haven't found any definitive explanation on how to do it. Could someone 
> please sanity check my intended approach?
>
> 1. Change the RF to 2 and restart Cassandra on both nodes
> 2. Run `nodetool repair` on both nodes, one at a time as to not halt up both 
> servers (will that sync data between the nodes?)
>
> In a 2 node environment and RF=2 using consistency level of ONE would still 
> ensure data is replicated to both servers, correct?
>
> -----Original Message-----
> From: Sylvain Lebresne [mailto:sylv...@datastax.com]
> Sent: Friday, March 25, 2011 3:01 AM
> To: user@cassandra.apache.org
> Cc: Jared Laprise
> Subject: Re: URGENT HELP PLEASE!
>
> On Fri, Mar 25, 2011 at 1:49 AM, Jared Laprise <ja...@webonyx.com> wrote:
>> Hello all, I'm running 2 Cassandra 6.5 nodes and I brought down the
>> secondary node and restarted the primary node. After Cassandra came
>> back up all data has been reverted to several months ago.
>
> Out of curiosity, when you said 'brought down the secondary node', did that 
> involved a decomission or removeToken ? If so, I have an explanation for you.
>
> --
> Sylvain
>
>
>> I could really use some incite here, this is a production website and
>> I need to act quickly. I have a cron job that takes a snapshot every
>> night, but even with that I tried to restore a snapshot on my local
>> development environment and it was also missing a ton of data.
>>
>>
>>
>> Any help will be so appreciated.
>>
>>
>>
>>
>

Re: URGENT HELP PLEASE!

Reply via email to