Re: Shifting data to DCOS

kurt greaves Wed, 02 May 2018 15:58:15 -0700

Something is not right if it thinks the rf is different. Do you have the
command you ran for repair and the error?


If you are willing to do the operation again I'd be interested to see if
nodetool cleanup causes any data to be removed (you should snapshot the
disks before running this as it will remove data if tokens were incorrect).

On Wed., 2 May 2018, 21:48 Faraz Mateen, <fmat...@an10.io> wrote:

> Hi all,
>
> Sorry I couldn't update earlier as I got caught up in some other stuff.
>
> Anyway, my previous 3 node cluster was on version 3.9.  I created a new
> cluster of cassandra 3.11.2 with same number of nodes on GCE VMs instead of
> DC/OS. My existing cluster has cassandra data on persistent disks. I made
> copies of those disks and attached them to new cluster.
>
> I was using the following link to move data to the new cluster:
>
> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsSnapshotRestoreNewCluster.html
>
> As mentioned in the link, I manually assigned token ranges to each node
> according to their corresponding node in the previous cluster. When I
> restarted cassandra process on the VMs, I noticed that it had automatically
> picked up all my keyspaces and column families. I did not recreate schema
> or copy data manually or run sstablesloader. I am not sure if this should
> have happened.
>
> Anyway, the data in both clusters is still not in sync. I ran a simple
> count query on a table both clusters and got different results:
>
> Old cluster: 217699
> New Cluster: 138770
>
> On the new cluster, when I run nodetool repair for my keyspace, it runs
> fine on one node, but on other two nodes it says that keyspace replication
> factor is 1 so repair is not needed. Cqlsh also shows that the replication
> factor is 2.
>
> Nodetool status on new and old cluster shows different outputs for each
> cluster as well.
>
> *Cluster1:*
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address     Load       Tokens       Owns    Host ID
>             Rack
> UN  10.128.1.1  228.14 GiB  256          ?
> 63ff8054-934a-4a7a-a33f-405e064bc8e8  rack1
> UN  10.128.1.2  231.25 GiB  256          ?
> 702e8a31-6441-4444-b569-d2d137d54a5d  rack1
> UN  10.128.1.3  199.91 GiB  256          ?
> b5b22a90-f037-433a-8ad9-f370b26cca26  rack1
>
> *Cluster2:*
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address     Load       Tokens       Owns    Host ID
>             Rack
> UJ  10.142.0.4  211.27 GiB  256          ?
> c55fef77-9c78-449c-b0d9-64e755caee7d  rack1
> UN  10.142.0.2  228.14 GiB  256          ?
> 0065c8e1-47be-4cf8-a3fe-3f4d20ff1b47  rack1
> UJ  10.142.0.3  241.77 GiB  256          ?
> f3b3f409-d108-4751-93ba-682692e46318  rack1
>
> This is weird because both the clusters have essentially same disks
> attached to them.
>  Only one node (10.142.0.2) in cluster2 has the same load as its
> counterpart in the cluster1 (10.128.1.1).
> This is also the node where nodetool repair seems to be running fine and
> it is also acting as the seed node in second cluster.
>
> I am confused that what might be causing this inconsistency in load and
> replication factor?  Has anyone ever seen different replication factor for
> same keyspace on different nodes? Is there a problem in my workflow?
> Can anyone please suggest the best way to move data from one cluster to
> another?
>
> Any help will be greatly appreciated.
>
> On Tue, Apr 17, 2018 at 6:52 AM, Faraz Mateen <fmat...@an10.io> wrote:
>
>> Thanks for the response guys.
>>
>> Let me try setting token ranges manually and move the data again to
>> correct nodes. Will update with the outcome soon.
>>
>>
>> On Tue, Apr 17, 2018 at 5:42 AM, kurt greaves <k...@instaclustr.com>
>> wrote:
>>
>>> Sorry for the delay.
>>>
>>>> Is the problem related to token ranges? How can I find out token range
>>>> for each node?
>>>> What can I do to further debug and root cause this?
>>>
>>> Very likely. See below.
>>>
>>> My previous cluster has 3 nodes but replication factor is 2. I am not
>>>> exactly sure how I would handle the tokens. Can you explain that a bit?
>>>
>>> The new cluster will have to have the same token ring as the old if you
>>> are copying from node to node. Basically you should get the set of tokens
>>> for each node (from nodetool ring) and when you spin up your 3 new nodes,
>>> set initial_tokens in the yaml to be the comma-separated list of tokens for 
>>> *exactly
>>> one* node from the previous cluster. When restoring the SSTables you
>>> need to make sure you take the SSTables from the original node and place it
>>> on the new node that has the *same* list of tokens. If you don't do
>>> this it won't be a replica for all the data in those SSTables and
>>> consequently you'll lose data (or it simply won't be available).
>>> 
>>>
>>
>>
>>
>> --
>> Faraz Mateen
>>
>
>
>
> --
> Faraz Mateen
>

Re: Shifting data to DCOS

Reply via email to