Thanks Ahu and Hussein, So my understanding is:
1. Commit log backup is not documented for Apache Cassandra, hence not standard. But can be used for restore on the same machine (For taking backup from commit_log_dir). If used on other machine(s) has to be in the same topology. Can it be used for replacement node? 2. For periodic backup Snapshot+Incremental backup is the best option Thanks, Adarsh Kumar On Fri, Nov 29, 2019 at 7:28 AM guo Maxwell <cclive1...@gmail.com> wrote: > Hossein is right , But for use , we restore to the same cassandra topology > ,So it is usable to do replay .But when restore to the > same machine it is also usable . > Using sstableloader cost too much time and more storage(though will reduce > after restored) > > Hossein Ghiyasi Mehr <ghiyasim...@gmail.com> 于2019年11月28日周四 下午7:40写道: > >> commitlog backup isn't usable in another machine. >> Backup solution depends on what you want to do: periodic backup or backup >> to restore on other machine? >> Periodic backup is combine of snapshot and incremental backup. Remove >> incremental backup after new snapshot. >> Take backup to restore on other machine: You can use snapshot after >> flushing memtable or Use sstableloader. >> >> >> ---- >> VafaTech.com - A Total Solution for Data Gathering & Analysis >> >> On Thu, Nov 28, 2019 at 6:05 AM guo Maxwell <cclive1...@gmail.com> wrote: >> >>> for cassandra or datastax's documentation, commitlog's backup is not >>> mentioned. >>> only snapshot and incremental backup is described to do backup . >>> >>> Though commitlog's archive for keyspace/table is not support but >>> commitlog' replay (though you must put log to commitlog_dir and restart the >>> process) >>> support the feature of keyspace/table' replay filter (using >>> -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 format to >>> replay the specified keyspace/table) >>> >>> Snapshot do affect the storage, for us we got snapshot one week a time >>> under the low business peak and making snapshot got throttle ,for you you >>> may >>> see the issue (https://issues.apache.org/jira/browse/CASSANDRA-13019) >>> >>> >>> >>> Adarsh Kumar <adarsh0...@gmail.com> 于2019年11月28日周四 上午1:00写道: >>> >>>> Thanks Guo and Eric for replying, >>>> >>>> I have some confusions about commit log backup: >>>> >>>> 1. commit log archival technique is ( >>>> >>>> https://support.datastax.com/hc/en-us/articles/115001593706-Manual-Backup-and-Restore-with-Point-in-time-and-table-level-restore- >>>> ) as good as an incremental backup, as it also captures commit logs >>>> after >>>> memtable flush. >>>> 2. If we go for "Snapshot + Incremental bk + Commit log", here we >>>> have to take commit log from commit log directory (is there any SOP for >>>> this?). As commit logs are not per table or ks, we will have chalange in >>>> restoring selective tables. >>>> 3. Snapshot based backups are easy to manage and operate due to its >>>> simplicity. But they are heavy on storage. Any views on this? >>>> 4. Please share any successful strategy that someone is using for >>>> production. We are still in the design phase and want to implement the >>>> best >>>> solution. >>>> >>>> Thanks Eric for sharing link for medusa. >>>> >>>> Regards, >>>> Adarsh Kumar >>>> >>>> On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell <cclive1...@gmail.com> >>>> wrote: >>>> >>>>> For me, I think the last one : >>>>> Snapshot + Incremental + commitlog >>>>> is the most meaningful way to do backup and restore, when you make the >>>>> data backup to some where else like AWS S3. >>>>> >>>>> - Snapshot based backup // for incremental data will not be >>>>> backuped and may lose data when restore to the time latter than >>>>> snapshot >>>>> time; >>>>> - Incremental backups // better than snapshot backup .but >>>>> with Insufficient data accuracy. For data remain in the memtable will >>>>> be >>>>> lose; >>>>> - Snapshot + incremental >>>>> - Snapshot + commitlog archival // better data precision than made >>>>> incremental backup, but the data in the non archived commitlog(not >>>>> archive >>>>> and commitlog log not closed) will not restore and will lose. Also >>>>> when log >>>>> is too much, do log reply will cost very mucu time >>>>> >>>>> For me ,We use snapshot + incremental + commitlog archive. We read >>>>> snapshot data and incremental data .Also the log is backuped .But we will >>>>> not backup the >>>>> log whose data have been flush to sstable ,for the data will be >>>>> backuped by the way we do incremental backup . >>>>> >>>>> This way , the data will exist in the format of sstable trough >>>>> snapshot backup and incremental backup . The log number will be very small >>>>> .And log replay will not cost much time. >>>>> >>>>> >>>>> >>>>> Eric LELEU <e...@strapdata.com> 于2019年11月27日周三 下午4:13写道: >>>>> >>>>>> Hi, >>>>>> TheLastPickle & Spotify have released Medusa as Cassandra Backup tool. >>>>>> >>>>>> See : >>>>>> https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html >>>>>> >>>>>> Hope this link will help you. >>>>>> >>>>>> Eric >>>>>> >>>>>> >>>>>> Le 27/11/2019 à 08:10, Adarsh Kumar a écrit : >>>>>> >>>>>> Hi, >>>>>> >>>>>> I was looking for the backup strategies of Cassandra. After some >>>>>> study I came to know that there are the following options: >>>>>> >>>>>> - Snapshot based backup >>>>>> - Incremental backups >>>>>> - Snapshot + incremental >>>>>> - Snapshot + commitlog archival >>>>>> - Snapshot + Incremental + commitlog >>>>>> >>>>>> Which is the most suitable and feasible approach? Also which of these >>>>>> is used most. >>>>>> Please let me know if there is any other option to tool available. >>>>>> >>>>>> Thanks in advance. >>>>>> >>>>>> Regards, >>>>>> Adarsh Kumar >>>>>> >>>>>> >>>>> >>>>> -- >>>>> you are the apple of my eye ! >>>>> >>>> >>> >>> -- >>> you are the apple of my eye ! >>> >> > > -- > you are the apple of my eye ! >