Hi Jon, Thanks for taking the time to read and reply to this proposal. Would encourage you to approach it from
an attitude of seeking understanding on the part of the first-time CEP author, as this reply casts it off
pretty quickly as NIH. The proposal isn't mine, but I'll offer a few notes on where I see this as valuable: –
It's valuable for Cassandra to have an ecosystem-native mechanism of migrating data between physical/virtual
instances outside the standard streaming path. As Hari mentions, the current ecosystem-native approach of
executing repairs, decommissions, and bootstraps is time-consuming and cumbersome. – An ecosystem-native
solution is safer than a bunch of bash and rsync. Defining a safe protocol to migrate data between instances
via rsync without downtime is surprisingly difficult - and even moreso to do safely and repeatedly at scale.
Enabling this process to be orchestrated by a control plane mechanizing offical endpoints of the database and
sidecar – rather than trying to move data around behind its back – is much safer than hoping one's cobbled
together the right set of scripts to move data in a way that won't violate strong / transactional consistency
guarantees. This complexity is kind of exemplified by the "Migrating One Instance" section of the doc
and state machine diagram, which illustrates an approach to solving that problem. – An ecosystem-native
approach poses fewer security concerns than rsync. mTLS-authenticated endpoints in the sidecar for data
movement eliminate the requirement for orchestration to occur via (typically) high-privilege SSH, which often
allows for code execution of some form or complex efforts to scope SSH privileges of particular users; and
eliminates the need to manage and secure rsyncd processes on each instance if not via SSH. – An
ecosystem-native approach is more instrumentable and measurable than rsync. Support for data migration
endpoints in the sidecar would allow for metrics reporting, stats collection, and alerting via mature and
modern mechanisms rather than monitoring the output of a shell script. I'll yield to Hari to share more, though
today is a public holiday in India. I do see this CEP as solving an important problem. Thanks, – Scott On Apr
8, 2024, at 10:23 AM, Jon Haddad <j...@jonhaddad.com> wrote: This seems like a lot of work to create an
rsync alternative. I can't really say I see the point. I noticed your "rejected alternatives"
mentions it with this note: However, it might not be permitted by the administrator or available in various
environments such as Kubernetes or virtual instances like EC2. Enabling data transfer through a sidecar
facilitates smooth instance migration . This feels more like NIH than solving a real problem, as what you've
listed is a hypothetical, and one that's easily addressed. Jon On Fri, Apr 5, 2024 at 3:47 AM Venkata Hari
Krishna Nukala < n.v.harikrishna.apa...@gmail.com > wrote: Hi all, I have filed CEP-40 [1] for live
migrating Cassandra instances using the Cassandra Sidecar. When someone needs to move all or a portion of the
Cassandra nodes belonging to a cluster to different hosts, the traditional approach of Cassandra node
replacement can be time-consuming due to repairs and the bootstrapping of new nodes. Depending on the volume of
the storage service load, replacements (repair + bootstrap) may take anywhere from a few hours to days.
Proposing a Sidecar based solution to address these challenges. This solution proposes transferring data from
the old host (source) to the new host (destination) and then bringing up the Cassandra process at the
destination, to enable fast instance migration. This approach would help to minimise node downtime, as it is
based on a Sidecar solution for data transfer and avoids repairs and bootstrap. Looking forward to the
discussions. [1]
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
Thanks! Hari