Hi, I have been thinking about, and working on aspects of de-duplication for the Bacula Storage Daemon, following my talk at the Bacula Developer's Conference in September.
Both strands of work involve making changes to the Volume Format: Firstly, the current strictly serialized Volume without alignment considerations, does not store data blocks [4kb aligned on 4kb file boundary] aligned similarly on disk Volumes, other than incidentally. See the blog at: [http://blog.myunix.dk/2010/12/15/large-scale-disk-to-disk-backups-using-bacula-part-vi/] An improvement, enabling the underlying file-system to do disk block de-duplication [as with ZFS] would be to split the process of packing stream records to the Volume between data and non-data streams: Block: Block Header Record Header <Packed non-data streams> <Record Header for aligned data, ends on 4kb alignment> <Aligned 4kb data block(s)> repeat ... Details needed for file tails and whole files <4kb Would a change to "BB03" be an appropriate designation for the Volume labelling to indicate the processing required ? Secondly, I am working on a, disk only, volume format where the data streams are stored independently of the volume, and the volume only contains the sequence of [SHA1/SHA256] hashes [+ size/offset] that regenerate the file content. The concept, although not implementation, is from the 'bup' package on Sourceforge, and uses a cyclic CRC to generate span selections of the data to hash and store, averaging 8kb in size for the initial implementation. Similar issues arise in how to specify the volume format to the SD. Suggestions ? New stream IDs ? Label changes ? Thirdly, is anyone else working along similar lines ? Regards, and Happy New Year, Howard -- Howard Thomson <howard.thom...@dial.pipex.com> ------------------------------------------------------------------------------ Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel