Re: [DISCUSS] Snapshots outside of Cassandra data directory

Jon Haddad Sun, 12 Jan 2025 15:31:20 -0800

Are you proposing that we manage backups in the DB instead of Sidecar, or
that we have the same functionality in both C* proper and the sidecar?  Or
that we ship C* with backups to a local filesystem only?


Where should the line be on what goes into sidecar and what goes into C*
proper?

Jon



On Sun, Jan 12, 2025 at 3:04 PM Štefan Miklošovič <smikloso...@apache.org>
wrote:

> Oh yeah I knew Sidecar will be mentioned, let's dive into that.
>
> Sidecar has a lot of endpoints / functionality, backup / restore is just
> part of that.
>
> What I proposed has also thes advantages:
>
> 1) Every time you go to upload to some cloud storage provider, you need to
> add all the dependencies to Sidecar to do that. In the case of S3, we need
> to add S3 libs. What about Azure? We need to add a library which knows how
> to talk to Azure. Then GCP ... This was probably the case why this "cloud
> specific" functionality was never part of Cassandra itself but by adding
> all the libraries with all the dependencies, we would bloat the tarball
> unnecessarily, tracking all the dependencies which might be incompatible
> etc.
>
> However, you can also mount S3 bucket to a system and it acts as any other
> native data dir. You can do the same with Azure (1) etc. But you do not
> need to depend on any library. It will just copy files. That's it. That
> means we are ready for whatever storage there might be as long as it can be
> mounted locally. We would just have the same code for Azure, S3, NFS, a
> local disk ... anything.
>
> 2) I am not sure we should _force_ people to use Sidecar if there are way
> more simple ways to do the job. If we just enabled snapshots to be taken
> outside of Cassandra data dir, then there is no reason to use Sidecar just
> to be able to backup snapshots because Cassandra could do it itself. I
> think we should strive for doing as much as possible with the least amount
> of effort and I do not think that taking care of Sidecar for each node in a
> cluster, configuring it, learning it should be mandatory. What if a
> respective business is not interested in running Sidecar, they just want to
> copy directly from Cassandra and be done with it? If we force people to use
> Sidecar then somebody has to take care of all of that.
>
> I am not saying that Sidecar is not suitable for restoring / backuping,
> but I do not see anything wrong with having options.
>
> (1)
> https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-configuration
>
> On Sun, Jan 12, 2025 at 11:46 PM Jon Haddad <j...@rustyrazorblade.com>
> wrote:
>
>> Sound like part of a backup strategy.    Probably worth chiming in on the
>> sidecar issue: https://issues.apache.org/jira/browse/CASSSIDECAR-148.
>>
>> IIRC, Medusa and Tablesnap both uploaded a manifest and don't upload
>> multiple copies of the same SSTables.  I think this should definitely be
>> part of our backup system.
>>
>> Jon
>>
>>
>>
>> On Sun, Jan 12, 2025 at 10:25 AM Štefan Miklošovič <
>> smikloso...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> I would like to run this through ML to gather feedback as we are
>>> contemplating about making this happen.
>>>
>>> Currently, snapshots are just hardlinks located in a snapshot directory
>>> to live data directory. That is super handy as it occupies virtually zero
>>> disk space etc (as long as underlying SSTables are not compacted away, then
>>> their size would "materialize").
>>>
>>> On the other hand, because it is a hardlink, it is not possible to make
>>> hard links across block devices (infamous "Invalid cross-device link"
>>> error). That means that snapshots can ever be located on the very same disk
>>> Cassandra has its datadirs on.
>>>
>>> Imagine there is a company ABC which has 10 TiB disk (or NFS share)
>>> mounted to a Cassandra node and they would like to use that as a cheap /
>>> cold storage of snapshots. They do not care about the speed of such storage
>>> nor they care about how much space it occupies etc. when it comes to
>>> snapshots. On the other hand, they do not want to have snapshots occupying
>>> a disk space where Cassandra has its data because they consider it to be a
>>> waste of space. They would like to utilize fast disk and disk space for
>>> production data to the max and snapshots might eat a lot of that space
>>> unnecessarily.
>>>
>>> There might be a configuration property like "snapshot_root_dir:
>>> /mnt/nfs/cassandra" and if a snapshot is taken, it would just copy SSTables
>>> there, but we need to be a little bit smart here (By default, it would all
>>> work as it does now - hard links to snapshot directories located under
>>> Cassandra's data_file_directories.)
>>>
>>> Because it is a copy, it occupies disk space. But if we took 100
>>> snapshots on the same SSTables, we would not want to copy the same files
>>> 100 times. There is a very handy way to prevent this - unique SSTable
>>> identifiers (under already existing uuid_sstable_identifiers_enabled
>>> property) so we could have a flat destination hierarchy where all SSTables
>>> would be located in the same directory and we would just check if such
>>> SSTable is already there or not before copying it. Snapshot manifests
>>> (currently under manifest.json) would then contain all SSTables a logical
>>> snapshot consists of.
>>>
>>> This would be possible only for _user snapshots_. All snapshots taken by
>>> Cassandra itself (diagnostic snapshots, snapshots upon repairs, snapshots
>>> against all system tables, ephemeral snapshots) would continue to be hard
>>> links and it would not be possible to locate them outside of live data
>>> dirs.
>>>
>>> The advantages / characteristics of this approach for user snapshots:
>>>
>>> 1. Cassandra will be able to create snapshots located on different
>>> devices.
>>> 2. From an implementation perspective it would be totally transparent,
>>> there will be no specific code about "where" we copy. We would just copy,
>>> from Java perspective, as we copy anywhere else.
>>> 3. All the tooling would work as it does now - nodetool listsnapshots /
>>> clearsnapshot / snapshot. Same outputs, same behavior.
>>> 4. No need to use external tools copying SSTables to desired
>>> destination, custom scripts, manual synchronisation ...
>>> 5. Snapshots located outside of Cassandra live data dirs would behave
>>> the same when it comes to snapshot TTL. (TTL on snapshot means that after
>>> so and so period of time, they are automatically removed). This logic would
>>> be the same. Hence, there is not any need to re-invent a wheel when it
>>> comes to removing expired snapshots from the operator's perspective.
>>> 6. Such a solution would deduplicate SSTables so it would be as
>>> space-efficient as possible (but not as efficient as hardlinks, because of
>>> obvious reasons mentioned above).
>>>
>>> It seems to me that there is recently a "push" to add more logic to
>>> Cassandra where it was previously delegated for external toolings, for
>>> example CEP around automatic repairs are basically doing what external
>>> tooling does, we just move it under Cassandra. We would love to get rid of
>>> a lot of tooling and customly written logic around copying snapshot
>>> SSTables. From the implementation perspective it would be just plain Java,
>>> without any external dependencies etc. There seems to be a lot to gain for
>>> relatively straightforward additions to the snapshotting code.
>>>
>>> We did a serious housekeeping in CASSANDRA-18111 where we consolidated
>>> and centralized everything related to snapshot management so we feel
>>> comfortable to build logic like this on top of that. In fact,
>>> CASSANDRA-18111 was a prerequisite for this because we did not want to base
>>> this work on pre-18111 state of things when it comes to snapshots (it was
>>> all over the code base, fragmented and duplicated logic etc).
>>>
>>> WDYT?
>>>
>>> Regards
>>>
>>

Re: [DISCUSS] Snapshots outside of Cassandra data directory

Reply via email to