Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

C. Scott Andreas Fri, 19 Apr 2024 14:36:58 -0700

These are the salient points here for me, yes:

> My understanding from the proposal is that Sidecar would be able to migrate from a Cassandra instance that is already dead and cannot recover.

> That’s one thing I like about having it an external process — not that it’s bullet proof but it’s one less thing to worry about.

The manual/rsync version of the state machine Hari describes in the CEP is one of the best escape hatches for migrating an instance that’s overstressed, limping on ailing hardware, or that has exhausted disk. If the system is functional but the C* process is in bad shape, it’s great to have a paved-path flow for migrating the instance and data to more capable hardware.

I also agree in principle that “streaming should be just as fast via the C* process itself.” This hits a couple snags today:

- This option isn’t available when the C* instance is struggling.

- In the scenario of replacing an entire cluster’s hardware with new machines, applying this process to an entire cluster via host replacements of all instances (which also requires repairs) or by doubling then halving capacity is incredibly cumbersome and operationally-impacting to the database’s users - especially if the DB is already having a hard time.

- The host replacement process also puts a lot of stress on gossip and is a great way to encounter all sorts of painful races if you perform it hundreds or thousands of times (but shouldn’t be a problem in TCM-world).

So I think I agree with both points:

- Cassandra should be able to do this itself.

- It is also valuable to have a paved path implementation of a safe migration/forklift state machine when you’re in a bind, or need to do this hundreds or thousands of times.

On zero copy: what really makes ZCS fast compared to legacy streaming is that the JVM is able to ship entire files around, rather than deserializing SSTables and reserializing them to stream each individual row. That’s the slow and expensive part. It’s true that TLS means you incur an extra memcpy as that stream is encrypted before it’s chunked into packets — but the cost of that memcpy for encryption pales in comparison to how slow deserializing/reserializing SSTables is/was.

ZCS with TLS can push 20Gbps+ today on decent but not extravagant Xeon hardware. In-kernel TLS would also still encounter a memcpy in the encryption path; the kernel.org doc alludes to this via “the kernel will need to allocate a buffer for the encrypted data.” But it would allow using sendfile and cut a copy in userspace. If someone is interested in testing it out I’d love to learn what they find. It’s always a great surprise to learn there’s a more perf left on the table. This comparison looks promising: https://tinselcity.github.io/SSL_Sendfile/

– Scott

—

Mobile

On Apr 19, 2024, at 11:31 AM, Jordan West <jorda...@gmail.com> wrote:

If we are considering the main process then we have to do some additional work to ensure that it doesn’t put pressure on the JVM and introduce latency. That’s one thing I like about having it an external process — not that it’s bullet proof but it’s one less thing to worry about.

Jordan

On Thu, Apr 18, 2024 at 15:39 Francisco Guerrero <fran...@apache.org> wrote:
My understanding from the proposal is that Sidecar would be able to migrate
from a Cassandra instance that is already dead and cannot recover. This is a
scenario that is possible where Sidecar should still be able to migrate to a new
instance.

Alternatively, Cassandra itself could have some flag to start up with limited
subsystems enabled to allow live migration.

In any case, we'll need to weigh in the pros and cons of each alternative and
decide if the live migration process can be handled within the C* process itself
or if we allow this functionality to be handled by Sidecar.

I am looking forward to this feature though, as it will be of great value for many
users across the ecosystem.

On 2024/04/18 22:25:23 Jon Haddad wrote:
> Hmm... I guess if you're using encryption you can't use ZCS so there's that.
>
> It probably makes sense to implement kernel TLS:
> https://www.kernel.org/doc/html/v5.7/networking/tls.html
>
> Then we can get ZCS all the time, for bootstrap & replacements.
>
> Jon
>
>
> On Thu, Apr 18, 2024 at 12:50 PM Jon Haddad <j...@jonhaddad.com> wrote:
>
> > Ariel, having it in C* process makes sense to me.
> >
> > Please correct me if I'm wrong here, but shouldn't using ZCS to transfer
> > have no distinguishable difference in overhead from doing it using the
> > sidecar? Since the underlying call is sendfile, never hitting userspace, I
> > can't see why we'd opt for the transfer in sidecar. What's the
> > advantage of duplicating the work that's already been done?
> >
> > I can see using the sidecar for coordination to start and stop instances
> > or do things that require something out of process.
> >
> > Jon
> >
> >
> > On Thu, Apr 18, 2024 at 12:44 PM Ariel Weisberg <ar...@weisberg.ws> wrote:
> >
> >> Hi,
> >>
> >> If there is a faster/better way to replace a node why not have Cassandra
> >> support that natively without the sidecar so people who aren’t running the
> >> sidecar can benefit?
> >>
> >> Copying files over a network shouldn’t be slow in C* and it would also
> >> already have all the connectivity issues solved.
> >>
> >> Regards,
> >> Ariel
> >>
> >> On Fri, Apr 5, 2024, at 6:46 AM, Venkata Hari Krishna Nukala wrote:
> >>
> >> Hi all,
> >>
> >> I have filed CEP-40 [1] for live migrating Cassandra instances using the
> >> Cassandra Sidecar.
> >>
> >> When someone needs to move all or a portion of the Cassandra nodes
> >> belonging to a cluster to different hosts, the traditional approach of
> >> Cassandra node replacement can be time-consuming due to repairs and the
> >> bootstrapping of new nodes. Depending on the volume of the storage service
> >> load, replacements (repair + bootstrap) may take anywhere from a few hours
> >> to days.
> >>
> >> Proposing a Sidecar based solution to address these challenges. This
> >> solution proposes transferring data from the old host (source) to the new
> >> host (destination) and then bringing up the Cassandra process at the
> >> destination, to enable fast instance migration. This approach would help to
> >> minimise node downtime, as it is based on a Sidecar solution for data
> >> transfer and avoids repairs and bootstrap.
> >>
> >> Looking forward to the discussions.
> >>
> >> [1]
> >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
> >>
> >> Thanks!
> >> Hari
> >>
> >>
> >>
>

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

Reply via email to