Are you proposing that we manage backups in the DB instead of Sidecar, or that we have the same functionality in both C* proper and the sidecar? Or that we ship C* with backups to a local filesystem only?
Where should the line be on what goes into sidecar and what goes into C* proper? Jon On Sun, Jan 12, 2025 at 3:04 PM Štefan Miklošovič <smikloso...@apache.org> wrote: > Oh yeah I knew Sidecar will be mentioned, let's dive into that. > > Sidecar has a lot of endpoints / functionality, backup / restore is just > part of that. > > What I proposed has also thes advantages: > > 1) Every time you go to upload to some cloud storage provider, you need to > add all the dependencies to Sidecar to do that. In the case of S3, we need > to add S3 libs. What about Azure? We need to add a library which knows how > to talk to Azure. Then GCP ... This was probably the case why this "cloud > specific" functionality was never part of Cassandra itself but by adding > all the libraries with all the dependencies, we would bloat the tarball > unnecessarily, tracking all the dependencies which might be incompatible > etc. > > However, you can also mount S3 bucket to a system and it acts as any other > native data dir. You can do the same with Azure (1) etc. But you do not > need to depend on any library. It will just copy files. That's it. That > means we are ready for whatever storage there might be as long as it can be > mounted locally. We would just have the same code for Azure, S3, NFS, a > local disk ... anything. > > 2) I am not sure we should _force_ people to use Sidecar if there are way > more simple ways to do the job. If we just enabled snapshots to be taken > outside of Cassandra data dir, then there is no reason to use Sidecar just > to be able to backup snapshots because Cassandra could do it itself. I > think we should strive for doing as much as possible with the least amount > of effort and I do not think that taking care of Sidecar for each node in a > cluster, configuring it, learning it should be mandatory. What if a > respective business is not interested in running Sidecar, they just want to > copy directly from Cassandra and be done with it? If we force people to use > Sidecar then somebody has to take care of all of that. > > I am not saying that Sidecar is not suitable for restoring / backuping, > but I do not see anything wrong with having options. > > (1) > https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-configuration > > On Sun, Jan 12, 2025 at 11:46 PM Jon Haddad <j...@rustyrazorblade.com> > wrote: > >> Sound like part of a backup strategy. Probably worth chiming in on the >> sidecar issue: https://issues.apache.org/jira/browse/CASSSIDECAR-148. >> >> IIRC, Medusa and Tablesnap both uploaded a manifest and don't upload >> multiple copies of the same SSTables. I think this should definitely be >> part of our backup system. >> >> Jon >> >> >> >> On Sun, Jan 12, 2025 at 10:25 AM Štefan Miklošovič < >> smikloso...@apache.org> wrote: >> >>> Hi, >>> >>> I would like to run this through ML to gather feedback as we are >>> contemplating about making this happen. >>> >>> Currently, snapshots are just hardlinks located in a snapshot directory >>> to live data directory. That is super handy as it occupies virtually zero >>> disk space etc (as long as underlying SSTables are not compacted away, then >>> their size would "materialize"). >>> >>> On the other hand, because it is a hardlink, it is not possible to make >>> hard links across block devices (infamous "Invalid cross-device link" >>> error). That means that snapshots can ever be located on the very same disk >>> Cassandra has its datadirs on. >>> >>> Imagine there is a company ABC which has 10 TiB disk (or NFS share) >>> mounted to a Cassandra node and they would like to use that as a cheap / >>> cold storage of snapshots. They do not care about the speed of such storage >>> nor they care about how much space it occupies etc. when it comes to >>> snapshots. On the other hand, they do not want to have snapshots occupying >>> a disk space where Cassandra has its data because they consider it to be a >>> waste of space. They would like to utilize fast disk and disk space for >>> production data to the max and snapshots might eat a lot of that space >>> unnecessarily. >>> >>> There might be a configuration property like "snapshot_root_dir: >>> /mnt/nfs/cassandra" and if a snapshot is taken, it would just copy SSTables >>> there, but we need to be a little bit smart here (By default, it would all >>> work as it does now - hard links to snapshot directories located under >>> Cassandra's data_file_directories.) >>> >>> Because it is a copy, it occupies disk space. But if we took 100 >>> snapshots on the same SSTables, we would not want to copy the same files >>> 100 times. There is a very handy way to prevent this - unique SSTable >>> identifiers (under already existing uuid_sstable_identifiers_enabled >>> property) so we could have a flat destination hierarchy where all SSTables >>> would be located in the same directory and we would just check if such >>> SSTable is already there or not before copying it. Snapshot manifests >>> (currently under manifest.json) would then contain all SSTables a logical >>> snapshot consists of. >>> >>> This would be possible only for _user snapshots_. All snapshots taken by >>> Cassandra itself (diagnostic snapshots, snapshots upon repairs, snapshots >>> against all system tables, ephemeral snapshots) would continue to be hard >>> links and it would not be possible to locate them outside of live data >>> dirs. >>> >>> The advantages / characteristics of this approach for user snapshots: >>> >>> 1. Cassandra will be able to create snapshots located on different >>> devices. >>> 2. From an implementation perspective it would be totally transparent, >>> there will be no specific code about "where" we copy. We would just copy, >>> from Java perspective, as we copy anywhere else. >>> 3. All the tooling would work as it does now - nodetool listsnapshots / >>> clearsnapshot / snapshot. Same outputs, same behavior. >>> 4. No need to use external tools copying SSTables to desired >>> destination, custom scripts, manual synchronisation ... >>> 5. Snapshots located outside of Cassandra live data dirs would behave >>> the same when it comes to snapshot TTL. (TTL on snapshot means that after >>> so and so period of time, they are automatically removed). This logic would >>> be the same. Hence, there is not any need to re-invent a wheel when it >>> comes to removing expired snapshots from the operator's perspective. >>> 6. Such a solution would deduplicate SSTables so it would be as >>> space-efficient as possible (but not as efficient as hardlinks, because of >>> obvious reasons mentioned above). >>> >>> It seems to me that there is recently a "push" to add more logic to >>> Cassandra where it was previously delegated for external toolings, for >>> example CEP around automatic repairs are basically doing what external >>> tooling does, we just move it under Cassandra. We would love to get rid of >>> a lot of tooling and customly written logic around copying snapshot >>> SSTables. From the implementation perspective it would be just plain Java, >>> without any external dependencies etc. There seems to be a lot to gain for >>> relatively straightforward additions to the snapshotting code. >>> >>> We did a serious housekeeping in CASSANDRA-18111 where we consolidated >>> and centralized everything related to snapshot management so we feel >>> comfortable to build logic like this on top of that. In fact, >>> CASSANDRA-18111 was a prerequisite for this because we did not want to base >>> this work on pre-18111 state of things when it comes to snapshots (it was >>> all over the code base, fragmented and duplicated logic etc). >>> >>> WDYT? >>> >>> Regards >>> >>