Hi all, the following observations might be useful for other users looking to speed-up their backups. It might be specific to our local system architecture, but we hope that it might also help others to achieve shorter backup runtimes.
Setup: ------ The file server is an EMC VNX5200 unified storage system, which we use as a NAS NFSv4-mounted by the clients (workstations [via 1G Eth] and servers [via multiple 10G Eth]). The NAS is attached by a dedicated 10G Ethernet link to our backup server (Dell R630, 3.4 Ghz 6-core Xeon, 128GB memory), which hosts all Bacula 7.4.4 daemons (DIR, SD, FD) and Postgresql as catalog DB. We back up to a Tandberg LTO-6 StorageLoader (single drive), attached by SAS. The Bacula spool volume consists a a single large btrfs RAID-0 volume composed of four 800 GB Intel SSD drives. We already tuned Postgresql performance (achieved a 3h reduction in backup time) before the experiments described here. Bacula configuration: --------------------- Backing up each of two small local and three larger NFSv4-mounted file systems from the NAS, all in a single job, plus a final catalog backup. This actually is a bit simplified here, there a few more clients, but they have only very small backups that do not affect total runtime much. We use Full and Accurate Differential backups (see table below). Problem: -------- Ever since getting the VNX5200 operational, the backups worked, but they ran excessively slowly (far from the line rate of the 10G NAS-backup server link, see table at end). However, when doing streaming file operations (e.g., a tar) via NFSv4 from the NAS to the backup server, the full line rate was easily reached. Hypothesis: ----------- What leads to the slow backups is not the actual data transfer rate (which the NAS managed with only very minor CPU load when tar'ing), but the _latency_ incurred by the many meta-data operations (stat(), fstat() etc.) performed during backup. Proposed Solution: ------------------ To hide the latency of these operations, we split the single large backup job into nine smaller backup jobs of roughly equal size, which were then run _concurrently_ (setting Concurrent Jobs = 20 in bacula-dir.conf). The idea was to saturate the 10G link with the meta-data operations from multiple backup jobs, thus hiding their individual latency (similar to the SIMT multi-threading on graphics cards for hiding memory latencies). Since we are streaming to an SSD-based spool, we do not incur any seek penalties due to mixing multiple jobs on the same spool volume. Final Runtime Results, showing typical job sizes: ------------------------------------------------- Backup Before optimization After optimization Full: 17.5 h (3483 GB) 10.0 h (3712 GB) Differential: 14.6 h ( 285 GB) 6.6 h ( 314 GB) #Jobs: 3 + 1 (3 small + 1 big job) 3 + 9 (12 smaller jobs) Discussion: ----------- As you can see, we achieved the largest performance increase for our Accurate differential backup, which (we assume) performs far more meta-data queries than the Full backups. The disadvantage, of course, is that we now have to keep an eye on the individual sizes of our partitioned backup jobs to make sure they are still reasonably balanced. Future Work: ------------ It appears that such trickery might be unnecessary if the Bacula FD could perform something similar (hiding the latency of individual meta-data operations) on-the-fly, e.g. by executing in a multi-threaded fashion. This has been proposed as Item 15 in the Bacula `Projects' list since November 2005 but does not appear to have been implemented yet (?). We hope this contribution helps other Bacula users and welcome comments on similar experiences, or even better ways to achieve higher speed-ups on the same hardware. Many thanks go to Florian Stock for coming up with the idea and collecting the data! Best, Andreas Koch
signature.asc
Description: OpenPGP digital signature
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users