On 27/05/2020 23:17, Alan Brown wrote:
I've been running Bacula for ~15 years (community/enterprise) and have
identified a few areas which are in desperate of improvement:
For an "enterprise" grade backup system, it's amazingly fragile in a few
areas (particularly in actual Enterprise networks!)
Bacula DOES NOT LIKE and does not handle network interruptions _at all_
if backups are in progress. This _will_ cause backups to abort - and
these aborted backups are _not_ resumable
Similarly, if there's any kind of disruption between the director and
database, the only fix is to restart the director
What that means is that Bacula _cannot_ be used with a High Availability
database because network interruptions (when switching servers) are part
of the HA paradigm.
It also means that operators have to be _extremely_ careful about
allowing automated or other system upgrades
In days of multi-TB backup sets, this is turning into a showstopping
problem.
As we are an Enteprise customer this has been raised with Baculasystems
but been given _very_ low priority. I'd like to hear opinions from the
wider community on this
Opinion: I know bugs aren't sexy to work on but these need fixing, not
being brushed off. This is the difference between LAN-quality and actual
Enterprise grade software.
I do not consider these to be bugs - they aren't simple errors where
someone made a mistake or used the wrong sized variable - they require a
large amount of re-design and reimplementation of Bacula's communication
modules, and the scheduler, and no doubt other bits to go away.
Bacula started life twenty years ago, and the environment has changed
since then, and, while Bacula has kept up with a some things, disk as a
target rather than tape, frex, something like re-startable jobs is, as I
have said, not just an extension or addition to what is there, but a big
change to a large part of Bacula.
And that's a massive risk, it's the sort of task I would be looking at
having a whole team work on, a couple of designers, six to ten
programmers, and a QA team with a nasty manager who was not restricted
from saying, "No!" when things don't work quite right.
And the mob above all have a *really* good understanding of how the
various bits of Bacula work, and interact, and are capable of and
allowed to replace ancient groaning bits of code with newer versions
that just aren't as wrong. (First task - rename all files so the
extensions represent the C++ code inside them, and for the really
cruddy^Wannoying stuff, G++.)
And, from the commercial stand-point, that the changes could be made
without interrupting the existing income stream.
Then there's the projected time-line before it could be released?
I don't want to think about that, Bacula is fragile as it is, ripping it
apart and stitching it back together would be a massive task!
And Bacula does not have that capability, not in the OSS space nor in
the Enterprise space.
All the above said, I think that re-startable jobs would be a great
enhancement for Bacula, but how often and for how long does it try by
default before giving up? :->
Cheers,
Gary B-)
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users