This may not answer all your questions, but maybe it will help move you further along: - you could copy the data (not system) folders *IF* the clusters match in topology. This would include the clusters having the same token range assignment(s). And you would have to copy the folders from one original node to the exact matching node in the second cluster. [To learn more, read about how Cassandra distributes data across the cluster. It will take effort to have exact matching clusters] - If you cannot make an exact match in topology, investigate something like dsbulk for moving data in and out of clusters with whatever topology they have. This is a much more portable solution. - I know that teams also do disk snapshots on cloud platforms as one back-up solution. They can attach that disk snapshot to a new VM (configured the same as the previous one) as needed. I don’t know all the particulars of this approach, though.
Sean Durity – Staff Systems Engineer, Cassandra From: Manu Chadha <manu.cha...@hotmail.com> Sent: Saturday, January 2, 2021 4:54 PM To: user@cassandra.apache.org Subject: [EXTERNAL] RE: unable to restore data from copied data directory Thanks. Shall I copy only system-schema folder? I tried copying all the folders and could think of the following issues I encountered 1. C* didnt’ start because the Cluster name by default is Test Cluster while the tables seem to refer to K8ssandra cluster “Saved cluster name k8ssandra != configured name Test Cluster” 2. Then I got this error – “Cannot start node if snitch's data center (datacenter1) differs from previous data center (dc1). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.” 3. At one point I also got error about no. of tokens (cannot change the number of tokens from 257 to 256). It seems it is not straightforward that I just copy the folders. Any advice please? Sent from Mail [go.microsoft.com]<https://urldefense.com/v3/__https:/go.microsoft.com/fwlink/?LinkId=550986__;!!M-nmYVHPHQ!ai9gYDQx9GefMy2MFnDQ1M78ESN82mrl5cEUatLFj1tid3lqNHXxRCnk4kKd19RO5AevlM0$> for Windows 10 From: Jeff Jirsa<mailto:jji...@gmail.com> Sent: 02 January 2021 20:57 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: unable to restore data from copied data directory On Jan 2, 2021, at 7:30 AM, Manu Chadha <manu.cha...@hotmail.com<mailto:manu.cha...@hotmail.com>> wrote: Hi Can I just copy the keyspace folders into new cassandra installation s backup and restore strategy? I am trying to do that but it isn’t working. I am using `K8ssandra` to run my single node C* cluster. I am experimenting with data backup and restore. Though K8ssandra uses medusa for data backup and restore, I could use it so I thought to test by simply copying/pasting the data directory. But I don’t see my data after restore. There could be mistakes in my approach so I am not really sure where to look. For example 1. K8ssandra uses Kubernetes’ persistent Volume Claims. Does that mean that the data is actually stored somewhere else and not in data directories of keyspaces? 2. Is there a way to look into the files in data directories of keyspaces to check what data is there. Maybe the data isn’t backed up properly. The steps I did to copy the data are: GKE cluster-> default-pool -> found node running k8ssandra-dc1-default-sts-0 container Go to VM instances -> SSH to the node which is running k8ssandra-dc1-default-sts-0 container Once SSHed, ran “docker exec -it k8s_cassandra_k8ssandra-dc1-default-sts-0_default_00b0d72a-c124-4b04-b25d-9e0f17edc582_0 /bin/bash” I noticed that the container has Cassandra : /opt/cassandra ./opt/cassandra/bin/cassandra ./opt/cassandra/javadoc/org/apache/cassandra ./var/lib/cassandra ./var/log/cassandra cd opt/cassandra/data/data. There were directories for each keyspace. I assume that when taking backups we can take a copy of this data directory. Then once we need to restore, we can simply copy them back to new node’s data directory. Note that I couldn’t run nodetool inside the container (nodetool flush or nodetool refresh) due to JMX issue. I don’t know how important it is to run the command. There is no traffic running on the systems though. I copied data directory from OUTSIDE container (from the node) using “docker cp container name:src_path dest_path” (eg. docker cp k8s_cassandra_k8ssandra-dc1-default-sts-0_default_00b0d72a-c124-4b04-b25d-9e0f17edc582_0:/opt/cassandra/data/data backup/) Then to transfer the backup directory to cloudshell (the console on web browser), I used “gcloud compute scp --recurse gke-k8ssandra-cluster-default-pool-1b1cc22a-rd6t:~/backup/data ~/K8ssandra_data_backup” Then I copied from cloudshell to my laptop/workstation, using cloudshell editor. This downloaded a tar of the backup (using a download link). Then I downloaded a new .gz of C*3.11.6 on my laptop. After unzipping it, I noticed that it hasn’t got a data directory. I ran C* and noticed that only default keyspaces were present. I also noticed that data directory was now created. I then stopped C*. Then I copied contents of backup folder (only keyspace name folders, not all folders) in data/data directory of a new Cassandra system which wasn’t running. Then I restarted the c* system but I can’t see the data via cqlsh. I can’t see the keyspace as well which probably is because I should probably copy system and system-* folders. But is it safe to do so? I tried it but landed into several issues around cluster name, snitch, data center names etc. The schemas are stored in system_schema so until / unless you copy that it’s not gonna work. Alternatively you can issue the DDL / CREATE statements on your laptop, it’ll make new directories, you can copy the data files into those directories. This is your safest and easiest option most of the time Would the approach of just copy/pasting folder work ? Thanks Manu Sent from Mail [go.microsoft.com]<https://urldefense.com/v3/__https:/go.microsoft.com/fwlink/?LinkId=550986__;!!M-nmYVHPHQ!ai9gYDQx9GefMy2MFnDQ1M78ESN82mrl5cEUatLFj1tid3lqNHXxRCnk4kKd19RO5AevlM0$> for Windows 10 ________________________________ The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.