[initially posted to the wrong address, sorry if it reaches you anyway] Hi all
I'm a rather happy Bacula user and have been following the lists quietly for a while. I'm piping up with some ideas and comments based on using Bacula for a couple of years for my work's backup needs. During my own use of Bacula for an off-site HDD-based backup setup, I've noticed an increasing number of difficulties that all stem from a single basic origin, and thought it worth raising here. Bacula was designed as a network backup system for tape storage. The director and especially sd design reflect this. Yet more and more people are using off-site disk as their primary backup medium, not just as a staging point for tape backups. Increasingly, I'm coming to think that it'd be desirable to have a dedicated storage daemon for HDD storage. This sd would eliminate much of the complexity of managing disk volumes for fast, concurrent backups, especially if the director was aware of disk-based storage daemons and their capabilities. Issues I struggle with right now: - I need to define a lot of different devices on the disk-backed sd, so that various backups may concurrently write to the SD. Each "device" is really just a subdirectory of the main backup store, with its own media type. The only alternative is interleaving onto one big volume, which is a nightmare if the different backups have different retention periods and disk storage isn't infinite. My backup setup isn't huge (6TB storage for backups) yet I have: $ grep ^Device /etc/bacula/bacula-sd.conf | wc -l 10 ... Device entries and matching director Storage entries. - The need for all these different storage device definitions bloats the sd config, as each device needs a whole bunch of redundant and repetitive config in its definition - Each Device{} entry for the SD needs a corresponding Storage{} entry on the director, bloating the director config too. - Effective volume lifetime management requires the definition of MANY pools, and association of those pools with storage devices. It's harder than it could be to reliably predict disk storage requirements so the backup device doesn't fill up, and to ensure that volumes are retained as long as they need to be. Doing it well requires lots and lots of pools, usually three per job or class of job. If storage is known to be disk based and one volume per job is forced, it could be simpler to configure disk-based pools and storage. To address these issues, a disk-only SD might: - Have exactly one storage root. The admin can mount volumes under it, use symlinks, or use bind mounts if some storage devices need to use different file systems/partitions/LVs. So everything might live under /storage/root (for the sake of this example). - Treat any requested device name as a subdirectory of that storage root. So if the director requests the device "Archival" then volumes will be created/accessed in the directory /storage/root/Archival, where the target directory will be created if it does not already exist. The sd wouldn't require configuration of devices; the fact that the director requested it would be considered enough configuration, since all devices would have the following implicit config: Device { Name = $DEVICENAME Media Type = File_$DEVICENAME Archive Device = $STORAGEROOT/$DEVICENAME SpoolDirectory = $STORAGEROOT/$DEVICENAME/spool LabelMedia = yes; Random Access = Yes; AutomaticMount = yes; RemovableMedia = no; AlwaysOpen = no; } - Assume one volume per job, and expect the director to force this for disk-based SDs. This would simplify volume management. - Allow a device to be open multiple times with different volumes. So, the "Archival" device might be writing to "Archival-002" and "Archival-003" at the same time, while another job has "Archival-001" mounted for read-verify. This would eliminate the need to define lots of storage devices that aren't actually any different, just so that many backups may be in progress on the sd at once without volume interleaving and without the need for huge spool files. I'm not sure this is possible without an extension of dir<->sd protocol, but as that's not stable release-to-release that shouldn't be a big issue. - (An alternative to the above) write spool files as valid volumes in their proper target locations but with a temporary file name. When they're written, rather than despooling them, simply mv() them into place. This could only work with one volume per job, but that should be forced for disk-based SDs anyway. - Maybe implement par2 for damaged volume repair & recovery, since one volume per job means that volumes will never be appended to only truncated or deleted. The director, when using a disk-based sd, would: - Require that disk-based SDs be declared as such in their Storage {} entries, and refuse to talk to a disk-based sd not declared as such. - Know that it can open a device on a disk-based sd multiple times with *different* volumes without the need for spooling. Currently the director can let multiple jobs use a device, but only with if they share the same volume. It's expected that the sd will spool the jobs or will interleave the data on the volume. Neither is necessary or desirable for disk storage; the sd can just write to multiple volume files within a directory at once. - Send a "delete volume file" message to the disk sd when a volume is deleted from the catalog. Similarly, when a volume is purged, send a "truncate volume file" message to the disk sd. - Support an alternative form of Storage {} definition for disk based storage, where multiple device names may be listed. So instead of: Storage { # Max concurrent jobs = 1, no spooling required Name = File_Archival Address = backup SDPort = 9103 Password = "XXXXXXXXXXX" Device = FileStorage_Archival Media Type = File_Archival Maximum Concurrent Jobs = 1 } Storage { # Max concurrent jobs = 1, no spooling required Name = File_CyrusMail Address = backup SDPort = 9103 Password = "XXXXXXXXXXX" Device = FileStorage_CyrusMail Media Type = File_CyrusMail Maximum Concurrent Jobs = 1 } Storage { # Jobs using this device must spool! Name = File_HomeDir Address = backup SDPort = 9103 Password = "XXXXXXXXXXX" Device = FileStorage_HomeDir Media Type = File_HomeDir Maximum Concurrent Jobs = 4 } .... etc .... one could write: DiskStorage { Name = DiskSD Devices = "File_Archival", "File_CyrusMail", "File_HomeDir" Address = backup SDPort = 9103 Password = "XXXXXXXXXXX" } Is this insane? Or a viable approach to tackling some of the complexities of faking tape backup on disk as Bacula currently tries to do? -- Craig Ringer ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users