Bacula DOES NOT LIKE and does not handle network interruptions _at all_
if backups are in progress. This _will_ cause backups to abort - and
these aborted backups are _not_ resumable
Hi,
My feeble two cents is that this has been a bit of an Achilles heel for
us even though we are a LAN backup environment (e.g. backups don't leave
our local network). We are still running an older "somewhat/slightly"
customized/modified version of community bacula so I have not explored
the restarting of stopped jobs option that has come with newer versions.
Given that, I can recall when we initially deployed our "backups to
disk" setup, I would see backups of large file systems/data (e.g. 1TB)
write 3/4ths of their data to volumes and then error out due to some
random network interruption. I didn't like the idea that this meant
e.g. 750GBs worth of our volume space was taken up by an
errored/incomplete job that would never be used. Because of this, I had
to implement spooling which typically people would only do if their
backups were then being written to sequential media (tape). So, we now
spool all jobs to dedicated spool disks and then bacula writes that data
to the disk data volumes. It fixed the "cruft" issue and made large
backups more stable (along with other options). But I can imagine a
scenario where we would not have had to do this if Bacula could more
easily recover from network glitches and automatically restart jobs
where it last left off (thinking along the lines of the concept of
checkpointing in a RDBMS).
As someone else said, this would require non-trivial changes to Bacula
(i.e. I won't be making those changes to our version - :) ) and the
devil would be in the details in practice. Still, if it was put to a
vote, I'd probably vote for this as "a nice feature to have."
cheers,
--tom
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users