Hi all,

the following observations might be useful for other users looking to
speed-up their backups. It might be specific to our local system
architecture, but we hope that it might also help others to achieve shorter
backup runtimes.

Setup:
------

The file server is an EMC VNX5200 unified storage system, which we use as a
NAS NFSv4-mounted by the clients (workstations [via 1G Eth] and servers [via
multiple 10G Eth]). The NAS is attached by a dedicated 10G Ethernet link to
our backup server (Dell R630, 3.4 Ghz 6-core Xeon, 128GB memory), which
hosts all Bacula 7.4.4 daemons (DIR, SD, FD) and Postgresql as catalog DB.
We back up to a Tandberg LTO-6 StorageLoader (single drive), attached by
SAS. The Bacula spool volume consists a a single large btrfs RAID-0 volume
composed of four 800 GB Intel SSD drives. We already tuned Postgresql
performance (achieved a 3h reduction in backup time) before the experiments
described here.

Bacula configuration:
---------------------

Backing up each of two small local and three larger NFSv4-mounted file
systems from the NAS, all in a single job, plus a final catalog backup. This
actually is a bit simplified here, there a few more clients, but they have
only very small backups that do not affect total runtime much. We use Full
and Accurate Differential backups (see table below).

Problem:
--------

Ever since getting the VNX5200 operational, the backups worked, but they ran
excessively slowly (far from the line rate of the 10G NAS-backup server
link, see table at end). However, when doing streaming file operations
(e.g., a tar) via NFSv4 from the NAS to the backup server, the full line
rate was easily reached.

Hypothesis:
-----------

What leads to the slow backups is not the actual data transfer rate (which
the NAS managed with only very minor CPU load when tar'ing), but the
_latency_ incurred by the many meta-data operations (stat(), fstat() etc.)
performed during backup.

Proposed Solution:
------------------

To hide the latency of these operations, we split the single large backup
job into nine smaller backup jobs of roughly equal size, which were then run
_concurrently_ (setting Concurrent Jobs = 20 in bacula-dir.conf). The idea
was to saturate the 10G link with the meta-data operations from multiple
backup jobs, thus hiding their individual latency (similar to the SIMT
multi-threading on graphics cards for hiding memory latencies). Since we are
streaming to an SSD-based spool, we do not incur any seek penalties due to
mixing multiple jobs on the same spool volume.

Final Runtime Results, showing typical job sizes:
-------------------------------------------------

Backup        Before optimization          After optimization
Full:         17.5 h (3483 GB)             10.0 h (3712 GB)
Differential: 14.6 h ( 285 GB)              6.6 h ( 314 GB)
#Jobs:        3 + 1 (3 small + 1 big job)   3 + 9 (12 smaller jobs)

Discussion:
-----------

As you can see, we achieved the largest performance increase for our
Accurate differential backup, which (we assume) performs far more meta-data
queries than the Full backups. The disadvantage, of course, is that we now
have to keep an eye on the individual sizes of our partitioned backup jobs
to make sure they are still reasonably balanced.

Future Work:
------------

It appears that such trickery might be unnecessary if the Bacula FD could
perform something similar (hiding the latency of individual meta-data
operations) on-the-fly, e.g. by executing in a multi-threaded fashion. This
has been proposed as Item 15 in the Bacula `Projects' list since November
2005 but does not appear to have been implemented yet (?).

We hope this contribution helps other Bacula users and welcome comments on
similar experiences, or even better ways to achieve higher speed-ups on the
same hardware.

Many thanks go to Florian Stock for coming up with the idea and collecting
the data!

Best,
  Andreas Koch

Attachment: signature.asc
Description: OpenPGP digital signature

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to