Hi all I just encountered a corrupt block in one of my on-disk volumes on the backup server. That's an issue in and of its self, but what I wanted to raise was a problem it created when restoring from the damaged volume.
After starting the storage daemon with the -p option so that it wouldn't abort completely after detecting the checksum error, attempts to restore from the volume failed with: 02-Feb 14:23 HOSTNAME-sd JobId 6746: Error: block.c:318 Volume data error at 6:669857422! Block checksum mismatch in block=409842 len=64512: calc=c044b2ce blk=5df394c1 02-Feb 14:23 HOSTNAME-fd JobId 6746: Error: attribs.c:421 File size of restored file /var/spool/cyrus/restore2/mnt/cyrus_mail_snap/mail/user/USERNAME/Sent/1733. not correct. Original 5886048, restored 2359296. 02-Feb 14:23 HOSTNAME-dir JobId 6746: Error: Bacula backup-dir 2.4.4 (28Dec08): 02-Feb-2009 14:23:57 It appears that the director or fd was aborting the job completely if one file failed to restore. I was able to prevent that with some butchery of attribs.c so I could restore my backup sans the file containing the damaged block, but I thought this issue was worth raising on the list since one damaged block REALLY must not prevent a backup from being restored. Perhaps the restore job should have an additional configurable parameter "errors" with options "abort" or "continue" ? The volume in question contains files that were stored with the Options { compression = gzip; signature = MD5; }. I also think that the error message from the bacula-sd needs to point out the "-p" option, eg: 02-Feb 14:23 HOSTNAME-sd JobId 6746: Error: block.c:318 Volume data error at 6:669857422! Block checksum mismatch in block=409842 len=64512: calc=c044b2ce blk=5df394c1. Fatal; restart HOSTNAME-sd with the -p flag to attempt to continue after errors. ... especially since "-p" isn't documented in the man page, only in the bacula-sd usage summary. You have to know it's the sd responsible for aborting the job, and that the option to tell it to behave differently exists. That's more research than should need to be done when one's trying to get a server back up and running! In closing, I'd like to note that despite this recent frustrating experience, I've been delighted with Bacula, and really appreciate the time and effort that's been put into it from the spare time of kind people. Having done my own fair share of OSS dev work, I know how much difference it can make to have people notice and appreciate your work - and trust me, yours has made a WORLD of difference to my sanity when managing a complex network of machines with several different OSes, absurd volumes of data, and clumsy users. For example, being able to effortlessly restore the newspaper's production files after a user accidentally deleted it on deadline morning was a lifesaver. -- Craig Ringer ------------------------------------------------------------------------------ Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users