The short answer is "no". The longer answer is "it depends". The most concise discussion I've seen is Inktank's Multi-site option whitepaper: http://info.inktank.com/multisite_options_with_inktank_ceph_enterprise

That white paper only addresses RBD backups (using snapshots) and RadosGW backups (using RadosGW replication). The first option in the whitepaper, a single cluster in multiple location, isn't a backup.

I'm not aware of any backup or offsite capability for raw RADOS pools.


There really aren't any good options for backing up CephFS. You could use rsync on CephFS, but it's not going to work well. rsync to offsite locations begins to have problems around the TB size, give or take an order of magnitude. The exact spot depends on your bandwidth, latency, file count, average file size, average file churn, and Disk I/O on both sides. It takes a lot of time and Disk I/O to enumerate all the files on the filesystem, and compare them to the offsite copy. CephFS does have some nice features that could make for an efficient backup. If rsync (or any backup client) was aware of the way CephFS handles directory size and timestamp, it could prune the directory tree enumeration much more efficiently. That should scale well to much larger file systems, mostly limited by file churn and churn locality. I don't know of anybody that's working on that. I'm interested in the concept, but I have no plans (personal or professional) to use CephFS.


I'm currently working on adding Snapshot capabilities to RadosGW. Combined with replication, it can protect against disasters, PEBKAC, and application error. Replication alone only protects against disasters, but not PEBKAC nor application errors. Just like RAID protects against disk failure, but not file deletion.


Replication + Snapshots (for both RadosGW and RBD) don't protect against a determined attacker. Even tape is vulnerable to a determined attacker with a high security level in your organization. The trick with both offline backups and remote snapshots is to set up enough barriers and checks that things get noticed before a determined attacker can finish the job. It's easier to do with offline backups than online backups.




*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter <http://www.twitter.com/centraldesktop> | Facebook <http://www.facebook.com/CentralDesktop> | LinkedIn <http://www.linkedin.com/groups?gid=147417> | Blog <http://cdblog.centraldesktop.com/>

On 4/2/14 00:08 , Robert Sander wrote:
Hi,

what are the options to consistently backup and restore
data out of a ceph cluster?

- RBDs can be snapshotted.
- Data on RBDs used inside VMs can be backed up using tools from the guest.
- CephFS data can be backed up using rsync are similar tools

What about object data in other pools?

There are two scenarios where a backup is needed:

- disaster recovery, i.e. the while cluster goes nuts
- single item restore, because PEBKAC or application error

Is there any work on progress to cover these?

Regards


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to