Hi,

I have been thinking about, and working on aspects of de-duplication for
the Bacula Storage Daemon, following my talk at the Bacula Developer's
Conference in September.

Both strands of work involve making changes to the Volume Format: 

Firstly, the current strictly serialized Volume without alignment
considerations, does not store data blocks [4kb aligned on 4kb file
boundary] aligned similarly on disk Volumes, other than incidentally.

See the blog at:
[http://blog.myunix.dk/2010/12/15/large-scale-disk-to-disk-backups-using-bacula-part-vi/]

An improvement, enabling the underlying file-system to do disk block
de-duplication [as with ZFS] would be to split the process of packing
stream records to the Volume between data and non-data streams:

Block:
        Block Header
        Record Header
        <Packed non-data streams>
        <Record Header for aligned data, ends on 4kb alignment>
        <Aligned 4kb data block(s)>

        repeat ...

Details needed for file tails and whole files <4kb

Would a change to "BB03" be an appropriate designation for the Volume
labelling to indicate the processing required ?

Secondly, I am working on a, disk only, volume format where the data
streams are stored independently of the volume, and the volume only
contains the sequence of [SHA1/SHA256] hashes [+ size/offset] that
regenerate the file content.

The concept, although not implementation, is from the 'bup' package on
Sourceforge, and uses a cyclic CRC to generate span selections of the
data to hash and store, averaging 8kb in size for the initial
implementation.

Similar issues arise in how to specify the volume format to the SD.

Suggestions ? New stream IDs ? Label changes ?

Thirdly, is anyone else working along similar lines ?

Regards, and Happy New Year,

Howard


-- 
Howard Thomson <howard.thom...@dial.pipex.com>


------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to