Hi ! TL;DR When backing up / migrating clients with a high volume of files the performance drops to a very low level. This can be traced back to the Database inserts.
Our server: Dell PE2900 4-Core Xeon 2GHz, 4GB Ram Red Hat Enterprise Linux 5.3 64-Bit 2.6.18-128.1.1.el5 #1 SMP Mon Jan 26 13:58:24 EST 2009 x86_64 x86_64 x86_64 GNU/Linux PERC 5/i - 200GB SAS Raid-5 (System + Database) PERC 4e/DC - 2x 350GB Raid-5 Disk Enclosure (MySQL-Temp & customer Data) LSI1020 - 4x LTO-1 Streamer QLA200 - 5 TB Fibre-Channel SAN with SATA Drives (Bacula Storage) Bacula 2.4.4 - batch insert enabled MySQL 5.0.45 We have been using Bacula for the past 5 years i think. My company actually financed the implementation of migration. We have recently upgraded our Server to RHEL5.3 64-Bit in hope that this might solve our problems, but i`m afraid that is not the case. What the server CAN do - backing up 20 concurrent clients: Average to-disk backup speed of 30MB/s 1500 - 2000 Database Inserts per second Of course this may vary depending on the client data that is beeing backed up / client performance. For example this is from our Fileserver fullbackup: Elapsed time: 1 day 30 mins 43 secs FD Files Written: 1,664,826 SD Files Written: 1,664,826 FD Bytes Written: 200,165,438,355 (200.1 GB) SD Bytes Written: 200,453,874,910 (200.4 GB) Rate: 2268.3 KB/s Software Compression: 35.8 % Its not exactly quick but acceptable for a fullbackup with compression, checksum, acls etc. This data is on our central SAN. Now lets take a look at another Server. This data is also on the same SAN as our Fileserver. Elapsed time: 14 hours 7 mins 45 secs FD Files Written: 6,383,475 SD Files Written: 6,383,475 FD Bytes Written: 7,504,478,849 (7.504 GB) SD Bytes Written: 8,613,076,970 (8.613 GB) Rate: 147.5 KB/s Software Compression: 46.5 % You can see there is quite a big difference between those two. Even though the Size is a LOT smaller (not even 5%) it has 4x the amount of files. You can see the pityful rate at which this backup runs. This isn't nice, but acceptable for a fullbackup that runs on the weekend. Bacula will get slow when lots of files are in one directory, but i think that is mainly a result of filesystem performance (ever tried `ls` in a directory with 1+ million files ?). Now lets see how those two fare during migration to tape. First the Fileserver: Elapsed time: 5 hours 55 mins 28 secs SD Files Written: 1,635,930 SD Bytes Written: 197,822,546,953 (197.8 GB) Rate: 9275.3 KB/s 6 hours, average Rate of 9 MB/s is very nice. and the other server: Elapsed time: 8 hours 4 mins Priority: 10 SD Files Written: 6,258,643 SD Bytes Written: 8,497,756,378 (8.497 GB) Rate: 292.6 KB/s It took 2 hours MORE for 8.5 GB instead of 200GB ... with a VERY bad rate for our Tapedrive ... quite obviously the problem is the high volume of files. Attribute spooling helps with the tape-wear, but makes the whole situation worse. This is from a previous backup with attribute spooling and before we cleaned up the number of files a bit. Elapsed time: 20 hours 41 mins 24 secs SD Files Written: 11,636,812 SD Bytes Written: 9,112,130,555 (9.112 GB) Rate: 122.3 KB/s If you take a closer look at that migration you can see that the tape-write process was done quite quickly (10 minutes in fact): 12-Mar 12:48 backup-sd JobId 30505: Ready to read from volume 12-Mar 12:58 backup-sd JobId 30505: End of all volumes. But after that follows this: 12-Mar 12:58 backup-sd JobId 30505: Sending spooled attrs to the Director. Despooling 4,111,322,637 bytes ... 13-Mar 09:36 backup-dir JobId 30505: Bacula backup-dir 2.4.4 (28Dec08): It takes 20 hours and 30 minutes to spool the attributes to the database. The time is spent about 2/3 with writing to the batch table and 1/3 with commiting the batch table to the database. This itself wouldn't be such a big problem if the job would not block the tape-devices and other migration jobs from running. You have to know that we have 12 clients with a file-volume of 5 - 12 million files. Lately we have run into the problem that one week is not enough time to migrate all data to tape so that in the end we never finish migrating (and can never restart the SD because it is in use 100% of the time). Using mytop i observed the Database during the migration. It doesn't matter if spooling is on or off, it writes at a steady 240 queries per second to the database. And with that many files it takes a while. Is there anything that can be done about this ? Is this a mysql limitation or a bacula limitation ? What are your experiences with a high volume of files ? Things one might think about: Why does a migration job insert the complete file attributes again ? The data is already in the database but linked to the original backup job. After the migration has run the on disk-data cannot be accessed anymore, so why keep it ? Or even better, why not just UPDATE it so it links to the migration job ? This should go a LOT quicker than doing a complete insert again and would remove this bottleneck completely. I'm afraid i'm not a programmer or i would take a look into this myself. I actually did look at the sources but it could be encrypted and look all the same to me :) If you need any more information on this i will try to get it. Just let me know. Best regards, Daniel Holtkamp -- ............................................................. Riege Software International GmbH Fon: +49 (2159) 9148 0 Mollsfeld 10 Fax: +49 (2159) 9148 11 40670 Meerbusch Web: www.riege.com Germany E-Mail: holtk...@riege.com --- --- Handelsregister: Managing Directors: Amtsgericht Neuss HRB-NR 4207 Christian Riege USt-ID-Nr.: DE120585842 Gabriele Riege Johannes Riege ............................................................. YOU CARE FOR FREIGHT, WE CARE FOR YOU ------------------------------------------------------------------------------ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users