On Tue, Apr 6, 2010 at 5:19 PM, Robert LeBlanc <rob...@leblancnet.us> wrote: > On Tue, Apr 6, 2010 at 12:37 AM, Craig Ringer > <cr...@postnewspapers.com.au> wrote: > > [snip] > >> >> Is this insane? Or a viable approach to tackling some of the >> complexities of faking tape backup on disk as Bacula currently tries to do? >> > > I love Bacula and have been working hard to promote it to people I > know. The biggest problem with bacula is it's disk management. We have > a DataDomain box that is getting horrible dedup rate and after looking > at the Bacula tape stream format, I can understand why. There is so > much extra data inserted into the stream that is very helpful for tape > drives that it makes deduping the data nearly impossible. > > I would love to see the stream simplified for disk based storage. > Another thing I'd like the option for is to be able to specify a block > size and start a file on the block boundry, you could use sparse files > to skip the space without taking it up. This would allow dedup > algorithms to really be able to compress Bacula data much better. It > would be awesome if the file stored in the Bacula stream looked > exactly like on the file systm so that if you do any tier 3 storage > with dedup and run your Bacula backups to the same storage, you get > free backups. > > Dedup is gaining a lot of traction, name your favorite vendor, or as > I'm doing look at lessfs. All of these would benefit hugely from a > smart SD that knows how to handle disk storage better and make Bacula > much more attractive. With the types of backups we are doing, we > should be getting 10x easy on our DataDomain, but we are lucky to get > 4x and I think that mostly comes from compression. > > Thanks, > > Robert LeBlanc > Life Sciences & Undergraduate Education Computer Support > Brigham Young University >
So still thinking about this, is there any reason to not have a hierarchical file structure for disk based backup rather than a serialized stream? Here are my thought, any comments welcome to have a good discussion about this. SD_Base_Dir +- PoolA +- PoolB +- JobID1 +- JobID2 +- Clientinfo.bacula (Bacula serial file that holds information similar to block header) +- Original File Structure (File structure from client is maintained and repeated here, allows for browsing of files outside of bacula) +- ClientFileA +- ClientFileA.bacula (Bacula serial file that holds information similar to the unix file attribute package) +- ClientFileB +- ClientFileB.bacula +- ClientDirA +- ClientDirA.bacula Although it's great to reuse code, I think something like this would be very benifical to disk based backups. The would help increase dedup rates and some file systems like btrfs and ZFS may be able to take advantage of linked files (there has been some discussion on the btrfs list about things like this). This would also allow it to reside on any file system as all the ACL and information is being serialized in separate files which keeps unique data out of the blocks of possible duplicated data. I think we could even reuse a lot of the serialization code, so it would just differ in how it writes the stream of data. Please excuse me if I'm way off here, just trying to think outside of the box a little. Robert LeBlanc Life Sciences & Undergraduate Education Computer Support Brigham Young University ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users