I have a product I need to backup data for.
It stores data in a "contentstore" which is structured in a way that make
it very predictable what needs to be backed up every day.
Let me explain, every time a content is created it is stored under this
specific hierarchy:

contentstore_root
      |_YYYY (e.g. 2019)
            |_mm (1...12)
                  |_dd (1...31)
                        |_HH (1...24)
                              |_MM (1...59)

This directory structure is stored on a GFS filesystem which has proven to
be very poor at traversing long list of files (caveats of all distributed
filesystems I guess). For example doing a `ls -lR` on the GFS mountpoint is
significantly slower than on local extX FS. And this of course has an
impact on the time taken for incremental backups... to the point where
backing up the contentstore root (which containd tens of millions of files)
would take more than half a day.

Also it should be noted that when a file is updated on the system, the old
version stays where it was, and the new version is stored has a new file
using the "new date directory path". When a file is deleted it remains on
the filesystem for a while before it is moved to a trash folder by an
internal job of the application after a "grace period",

As a consequence, given the directory structure and the general behaviour
of the application, I don't feel like incremental backups are really
needed, and Id like to get rid of them if possible to avoid those long
backup I have.
However all the configuration I could come up with have huge drawbacks in
some way or another.
For example daily backup of the folder-of-the-past-day  + monthly backup of
the fodler of the past month.... makes it very hard and complex to do a
full restore in case of disaster recovery....

I'm really interested in knowing if anybody is dealing with a similar kind
of backup and how do they deal with it? Also if you have any idea on what
features coud be helpful in my case (I took a look at the VirtualFull, ut
that doesn't seem to really solve the burden of backup administration)....
in shoort, any insight is appreciated.

I'd like to avoid as much as possible having to deal with local (tar)
archiving as we are here talking about tens of TB of data and can't really
afford having twice the space used in order to be able to back it up.

Regards
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to