Hello, We were using the 2.4.3 version of Bacula on Debian Etch for a long time and two days ago upgraded to the 2.4.4 version.
Since then we encountered now the second day a problem with the automatic volume recycling. We use a backup to disk and have defined a pool as following so that every job ends up on its separate volume: Pool { Name = DiskPool Pool Type = Backup Recycle = yes AutoPrune = yes Volume Retention = 40 days Maximum Volume Jobs = 1 Label Format = BAC- } The default job-definition which is used by all our clients is then using this DiskPool. While this was working before, since upgrading to 2.4.4 suddendly all scheduled jobs seem to end up on the same volume. So this is the status of the director: bacula-srv-dir Version: 2.4.4 (28 December 2008) i486-pc-linux-gnu debian 4.0 Daemon started 26-Jan-09 16:52, 54 Jobs run since started. Heap: heap=13,135,872 smbytes=750,907 max_bytes=857,133 bufs=4,950 max_bufs=5,224 Scheduled Jobs: Level Type Pri Scheduled Name Volume =================================================================================== Incremental Backup 10 27-Jan-09 20:05 job_srv1 BAC-1280 Incremental Backup 10 27-Jan-09 20:05 job_srv2 BAC-1280 Incremental Backup 10 27-Jan-09 20:05 job_srv3 BAC-1280 Incremental Backup 10 27-Jan-09 20:05 job_srv4 BAC-1280 Incremental Backup 10 27-Jan-09 20:05 job_srv5 BAC-1280 Incremental Backup 10 27-Jan-09 20:05 job_srv6 BAC-1280 Incremental Backup 10 27-Jan-09 20:05 job_srv7 BAC-1280 ... As you can see, all jobs from now run into volume BAC-1280. Furthermore, SD starts to block, when to many jobs are waiting for the volume. Running Jobs: JobId Level Name Status ====================================================================== 9449 Increme job_srv1.2009-01-27_04.05.00.46 is waiting on Storage File 9450 Increme job_srv2.2009-01-27_04.05.00.47 is waiting on Storage File 9451 Increme job_srv3.2009-01-27_04.05.00.48 is waiting on Storage File 9452 Increme job_srv4.2009-01-27_04.05.00.49 is waiting on Storage File 9453 Increme job_srv5.2009-01-27_04.05.00.50 is waiting on Storage File ... Further more jobs are then waiting "on max Storage jobs" (as it is currently limited to 20 concurrent jobs). When I strace bacula-sd, it just hangs arond: child_stack=0xb690b4c4, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, parent_tidptr=0xb690bbf8, {entry_number:6, base_addr:0xb690bbb0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb690bbf8) = 5400 [pid 8762] select(5, [4], NULL, NULL, NULL <unfinished ...> [pid 5400] gettimeofday({1233041312, 545094}, {4294967236, 0}) = 0 [pid 5400] read(24, "", 4) = 0 [pid 5400] time(NULL) = 1233041312 [pid 5400] stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=801, ...}) = 0 [pid 5400] close(24) = 0 [pid 5400] gettimeofday({1233041312, 545604}, {4294967236, 0}) = 0 [pid 5400] clock_gettime(CLOCK_REALTIME, {1233041312, 545703646}) = 0 [pid 5400] futex(0x80c5d3c, FUTEX_WAIT, 427, {1, 454296354} <unfinished ...> [pid 8765] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 8765] futex(0x80c6b8c, FUTEX_WAKE, 1) = 0 [pid 8765] time(NULL) = 1233041313 [pid 8765] gettimeofday({1233041313, 691712}, {4294967236, 0}) = 0 [pid 8765] time(NULL) = 1233041313 [pid 8765] clock_gettime(CLOCK_REALTIME, {1233041313, 691982419}) = 0 [pid 8765] futex(0x80c5ac4, FUTEX_WAIT, 3783, {29, 999729581} <unfinished ...> [pid 5400] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 5400] futex(0x80c5d20, FUTEX_WAKE, 1) = 0 [pid 5400] _exit(0) = ? Process 5400 detached [pid 8765] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 8765] futex(0x80c6b8c, FUTEX_WAKE, 1) = 0 [pid 8765] time(NULL) = 1233041343 [pid 8765] gettimeofday({1233041343, 699190}, {4294967236, 0}) = 0 [pid 8765] time(NULL) = 1233041343 [pid 8765] clock_gettime(CLOCK_REALTIME, {1233041343, 699274056}) = 0 [pid 8765] futex(0x80c5ac4, FUTEX_WAIT, 3785, {29, 999915944}) = -1 ETIMEDOUT (Connection timed out) [pid 8765] futex(0x80c6b8c, FUTEX_WAKE, 1) = 0 [pid 8765] time(NULL) = 1233041373 [pid 8765] gettimeofday({1233041373, 703431}, {4294967236, 0}) = 0 [pid 8765] time(NULL) = 1233041373 [pid 8765] clock_gettime(CLOCK_REALTIME, {1233041373, 703683524}) = 0 [pid 8765] futex(0x80c5ac4, FUTEX_WAIT, 3787, {29, 999747476}) = -1 ETIMEDOUT (Connection timed out) [pid 8765] futex(0x80c6b8c, FUTEX_WAKE, 1) = 0 It has many open connections from bacula-dir and just two connection to clients, which are already in a CLOSED_WAIT state. bacula-sd 8762 bacula 4u IPv4 502281 TCP bacula-srv:bacula-sd (LISTEN) bacula-sd 8762 bacula 5u IPv4 1230722 TCP bacula-srv:bacula-sd->bacula-srv:49618 (CLOSE_WAIT) bacula-sd 8762 bacula 6u IPv4 1230056 TCP bacula-srv:bacula-sd->bacula-srv:49615 (ESTABLISHED) bacula-sd 8762 bacula 7u IPv4 1229112 TCP bacula-srv:bacula-sd->bacula-srv:49608 (ESTABLISHED) bacula-sd 8762 bacula 8u IPv4 1230581 TCP bacula-srv:bacula-sd->bacula-srv:49617 (ESTABLISHED) bacula-sd 8762 bacula 9u IPv4 1231672 TCP bacula-srv:bacula-sd->bacula-srv:49627 (ESTABLISHED) bacula-sd 8762 bacula 10u IPv4 1229191 TCP bacula-srv:bacula-sd->bacula-srv:49612 (ESTABLISHED) bacula-sd 8762 bacula 11u sock 0,5 1231512 can't identify protocol bacula-sd 8762 bacula 12u IPv4 1229168 TCP bacula-srv:bacula-sd->bacula-srv:49611 (ESTABLISHED) bacula-sd 8762 bacula 13u IPv4 1230075 TCP bacula-srv:bacula-sd->bacula-srv:49616 (ESTABLISHED) bacula-sd 8762 bacula 14u IPv4 1231692 TCP bacula-srv:bacula-sd->bacula-srv:49628 (ESTABLISHED) bacula-sd 8762 bacula 15u IPv4 1229203 TCP bacula-srv:bacula-sd->bacula-srv:49613 (ESTABLISHED) bacula-sd 8762 bacula 16u IPv4 1474050 TCP bacula-srv:bacula-sd->bacula-srv:44644 (ESTABLISHED) bacula-sd 8762 bacula 17u IPv4 1229603 TCP bacula-srv:bacula-sd->bacula-srv:49614 (ESTABLISHED) bacula-sd 8762 bacula 18u IPv4 1474101 TCP bacula-srv:bacula-sd->bacula-srv:44646 (ESTABLISHED) bacula-sd 8762 bacula 19u IPv4 1231538 TCP bacula-srv:bacula-sd->bacula-srv:49626 (ESTABLISHED) bacula-sd 8762 bacula 20u IPv4 1228175 TCP bacula-srv:bacula-sd->bacula-srv:49601 (ESTABLISHED) bacula-sd 8762 bacula 21u IPv4 1231523 TCP bacula-srv:bacula-sd->srv1:51353 (CLOSE_WAIT) bacula-sd 8762 bacula 22u IPv4 1231524 TCP bacula-srv:bacula-sd->srv2:56544 (CLOSE_WAIT) bacula-sd 8762 bacula 31u IPv4 1228181 TCP bacula-srv:bacula-sd->bacula-srv:49602 (ESTABLISHED) bacula-sd 8762 bacula 32u IPv4 1228182 TCP bacula-srv:bacula-sd->bacula-srv:49603 (ESTABLISHED) bacula-sd 8762 bacula 33u IPv4 1228184 TCP bacula-srv:bacula-sd->bacula-srv:49604 (ESTABLISHED) bacula-sd 8762 bacula 34u IPv4 1228187 TCP bacula-srv:bacula-sd->bacula-srv:49605 (ESTABLISHED) bacula-sd 8762 bacula 35u IPv4 1228190 TCP bacula-srv:bacula-sd->bacula-srv:49606 (ESTABLISHED) bacula-sd 8762 bacula 37u IPv4 1228193 TCP bacula-srv:bacula-sd->bacula-srv:49607 (ESTABLISHED) Is it possible, that those dead-lock fixes introduced in 2.4.4 brought up some other problems? Regards, Andreas ------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users