On my main backup server, running Bacula 5.2.12 on CentOS 6.3 and backing up 18 clients to an LTO-3 autochanger, I recently activated concurrent jobs by setting Maximum Concurrent Jobs to 10 in the Director and Storage resources. This seemed to work quite well for a couple of days.
Last night the tape was full, and Bacula dutifully continued with the next one. The jobs following the one with the tape change, however, produced somewhat disquieting logs. - The nightly run started at 22:05 on tape Daily_2. - The first six backups (in order of completion) went on that tape. - The seventh job (as it happened, the one backing up the backup server itself, which by coincidence got promoted from Incremental to Full because of a Fileset modification) reported the tape change: 30-Jan 20:15 bacula-sd JobId 6142: Committing spooled data to Volume "Daily_2". Despooling 4,855,877,581 bytes ... 30-Jan 20:18 bacula-sd JobId 6142: End of Volume "Daily_2" at 319:3229 on device "LTO-3" (/dev/nst0). Write of 64512 bytes got -1. 30-Jan 20:18 bacula-sd JobId 6142: Re-read of last block succeeded. 30-Jan 20:18 bacula-sd JobId 6142: End of medium on Volume "Daily_2" Bytes=799,806,099,456 Blocks=12,397,787 at 30-Jan-2013 20:18. 30-Jan 20:18 bacula-sd JobId 6142: 3307 Issuing autochanger "unload slot 3, drive 0" command. 30-Jan 20:19 bacula-sd JobId 6142: 3304 Issuing autochanger "load slot 2, drive 0" command. 30-Jan 20:20 bacula-sd JobId 6142: 3305 Autochanger "load slot 2, drive 0", status is OK. 30-Jan 20:20 bacula-sd JobId 6142: Recycled volume "Daily_1" on device "LTO-3" (/dev/nst0), all previous data lost. 30-Jan 20:20 bacula-sd JobId 6142: New volume "Daily_1" mounted on device "LTO-3" (/dev/nst0) at 30-Jan-2013 20:20. 30-Jan 20:21 bacula-sd JobId 6142: Despooling elapsed time = 00:02:51, Transfer rate = 28.39 M Bytes/second - The next four jobs (always in order of completion) did however still report writing to tape Daily_2, the last one apparently even starting to use it well after it had been unloaded: 30-Jan 20:15 bacula-sd JobId 6156: Committing spooled data to Volume "Daily_2". Despooling 1,602,040,011 bytes ... 30-Jan 20:22 bacula-sd JobId 6156: Despooling elapsed time = 00:00:30, Transfer rate = 53.40 M Bytes/second 30-Jan 20:16 bacula-sd JobId 6147: Committing spooled data to Volume "Daily_2". Despooling 2,558,276,008 bytes ... 30-Jan 20:24 bacula-sd JobId 6147: Despooling elapsed time = 00:00:43, Transfer rate = 59.49 M Bytes/second 30-Jan 20:18 bacula-sd JobId 6154: Committing spooled data to Volume "Daily_2". Despooling 1,441,633,135 bytes ... 30-Jan 20:25 bacula-sd JobId 6154: Despooling elapsed time = 00:00:24, Transfer rate = 60.06 M Bytes/second 30-Jan 20:20 bacula-sd JobId 6150: Committing spooled data to Volume "Daily_2". Despooling 5,609,655,403 bytes ... 30-Jan 20:27 bacula-sd JobId 6150: Despooling elapsed time = 00:01:41, Transfer rate = 55.54 M Bytes/second - The job after that went to tape Daily_1 as expected: 30-Jan 20:22 bacula-sd JobId 6155: Committing spooled data to Volume "Daily_1". Despooling 4,379,708,893 bytes ... 30-Jan 20:33 bacula-sd JobId 6155: Despooling elapsed time = 00:03:09, Transfer rate = 23.17 M Bytes/second - Then, most disquieting, the next one reported that tape as previously written, and a file count mismatch: 30-Jan 20:28 bacula-sd JobId 6152: Volume "Daily_1" previously written, moving to end of data. 30-Jan 20:30 bacula-sd JobId 6152: Warning: For Volume "Daily_1": The number of files mismatch! Volume=4 Catalog=3 Correcting Catalog 30-Jan 20:30 bacula-sd JobId 6152: Spooling data ... 30-Jan 20:32 tourserv-fd JobId 6152: /var/lib/nfs/rpc_pipefs is a different filesystem. Will not descend from /var into it. 30-Jan 20:33 bacula-sd JobId 6152: Committing spooled data to Volume "Daily_1". Despooling 5,469,441,198 bytes ... 30-Jan 20:36 bacula-sd JobId 6152: Despooling elapsed time = 00:01:54, Transfer rate = 47.97 M Bytes/second - The remaining jobs completed uneventfully, possibly due to the fact that they were considerable delayed by long-running ClientRunBeforeJob-s, so the spooling area was drained completely before they started sending data. What has happened here? Should I worry? aTdHvAaNnKcSe Tilman -- Tilman Schmidt Phoenix Software GmbH Bonn, Germany
signature.asc
Description: OpenPGP digital signature
------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_jan
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users