On 4/15/2019 10:50 PM, Gary R. Schmidt wrote:
On 16/04/2019 12:24, Marcio Demetrio Bacci wrote:
...
2. Is there any restriction on mounting the Bacula VM on a DRBD
volume in the hypervisor. In this VM I will implement ZFS
deduplication for my backups?
I /think/ you are asking if Bacula can handle fail-over during backup?
Short answer is "no". (I am willing to accept corrections on this,
but AFAICS Bacula doesn't checkpoint its output.)
I would say "mostly", rather than "no". On fail-over, running jobs will
eventually fail, since the FD clients will not reconnect and their open
TCP sockets will timeout. AFAIK, there is no transparent TCP connection
migration. The virtual IP is migrated, but not the currently opened TCP
connections. However, this is not necessarily a bad thing. Bacula can be
configured to automatically re-run failed jobs, and there was some
reason for the fail-over in the first place. Do we really trust the
first part of the backup that was made while running on the failing
cluster node?
I have experimented with running Dir and SD in a KVM VM on a two-node
Corosync/Pacemaker cluster using active/passive DRBD storage for the
VM's OS. Backup storage was via iSCSI provided by an existing NAS. I
found no major issues.
On fail-over, Pacemaker attempts to initiate a shutdown on the failing
node. If the shutdown succeeds, then the bacula-dir and bacula-sd
services are stopped in the normal way and it affects running jobs
exactly the same as if those services were stopped manually. If the
shutdown fails, then the node is STONITH'd and it is the same effect as
pulling the power on the node. In that case, the clients' actively
running jobs are orphaned. Eventually, their TCP sockets timeout. In
either case, the running jobs are not successful, but as I stated
earlier, I don't see that as a major reason not to run Bacula in a VM.
Currently, I run Dir and SD in a VM on a four-node cluster, but without
fail-over for the Bacula VM. The VM and its DRBD storage have to be
brought up manually, but it allows a single catalog, a single IP, and a
single set of config, log, and spool files that can be brought up in
seconds on any one of the nodes. The reason for this is because I do not
have a FC-attached tape library or other HA capable backup device. To
bring up on another node I have to move my backup devices (USB-attached
RDS and portable hard drives plus SATA drives in removable drive bays
using vchanger). Based on how it worked with iSCS NAS storage, I believe
it would also work well with HA capable tape hardware on FC (or iSCSI?).
It should also work (ie. jobs running at fail-over will fail, but can be
automatically re-run) when storing backups on DRBD attached locally to
the VM. Of course, it doubles the storage space needed for volume files
and so will be considerable. Since the running jobs will fail at
fail-over, I see no reason for volume files to be HA. They just need to
be accessible to the VM, for example on iSCSI-attached NAS.
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users