Yes, I am aware of these techniques of keeping hash codes.  For lots of files, 
it works out to be an enormous amount of information that must be kept -- 
just do the mathematics for 1,000,000 files (times about 40 machines) each 
file with an average of 20 MB of data  gives hash code an links of approx 100 
GB.  

Apart from that, Bacula is not currently designed to work this way. It would 
take some rather major modifications.  Since this item was ranked priority 12 
by users, for the immediate future, I see little possibility of it being 
implemented.

That said, I think *enormous* savings in backup Volume space can be obtained 
by implementing #5 on the list (Base jobs).  

On Tuesday 03 January 2006 01:52, [EMAIL PROTECTED] wrote:
> The following bug has been SUBMITTED.
> ======================================================================
> http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000520
> ======================================================================
> Reported By:                ari
> Assigned To:
> ======================================================================
> Project:                    bacula
> Bug ID:                     520
> Category:                   other
> Reproducibility:            always
> Severity:                   feature
> Priority:                   normal
> Status:                     new
> ======================================================================
> Date Submitted:             01-02-2006 16:52 PST
> Last Modified:              01-02-2006 16:52 PST
> ======================================================================
> Summary:                    Directive/mode to backup only file changes, not
> entire file Description:
> An idea for item 12 at http://www.bacula.org/?page=projects. Rather than
> having to compare a file to the backed up file on disk, here is another
> approach:
>
> 1. Create a hash (MD5, SHA, whatever) for each 256kB of each file. Store
> that hash on disk or in the database. So a 1.1Mb file will have 5 hashes
> stored.
>
> 2. When backing up the file for a second time, compare the hashes for each
> 256kB block and only store the blocks which have changed.
>
> The above will work well for files which grow by having data added to the
> end or which have blocks of data change in content (eg. database data
> files) but not in size within the file (eg a mailbox where a message at
> the top of the file might be deleted). Removing one byte from the top of
> the file would cause the entire file to be resaved.
>
> The author of rsync has some interesting things to say about how to avoid
> this problem: http://samba.anu.edu.au/~tridge/phd_thesis.pdf but I can't
> quite see how to apply them to a case in Bacula where we cannot guarantee
> access to the entire file we need to compare to (ie the previous backup).
>
> However, implementing the algorithm above will result in benefits for some
> types of files. I don't know whether the benefit would be commensurate to
> the work involved or the data storage in the SQL database.
>
> One workaround might be:
> 1. if two blocks in a row don't match their hash
> 2. step one byte at a time, plus and minus from the starting point of the
> second block, calculating hashes until a match is found.
>
> Obviously much more work to determine the effectiveness of this approach
> is needed. Speed of hash calculation and real world trials would be
> factors in this decision. Almost sounds like another Phd thesis.... :-)
> ======================================================================
>
> Bug History
> Date Modified  Username       Field                    Change
> ======================================================================
> 01-02-06 16:52 ari            New Bug
> ======================================================================
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log
> files for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
> _______________________________________________
> Bacula-bugs mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/bacula-bugs

-- 
Best regards,

Kern

  (">
  /\
  V_V


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to