User optiz0r at irc helped me to get trace files for all daemons, its at http://filebin.ca/3HoXMMcEo2rv/traces.tar.gz
The configuration used may be slightly different (only difference I can think of is setting Attribute Spooling = yes). We noticed following errors: bacst1-sd.trace:bacst1-sd: device.c:232-1 getvolinfo failed. No new Vol: Error getting Volume info: 1998 Volume "bacst1_storage-full-vol-0001" catalog status is Used, but should be Append, Purged or Recycle. In this run, the error is reported for every volume except bacdir1_director-full-vol-0005, which is also the only volume that has other status than Used (is Append). Maybe it is significant? Dne 29.3.2017 v 16:42 Zdeněk Bělehrádek napsal(a): > Hi, > > We are using Bacula to back up our company's data. All storages are > ordinary Debian Jessie Linux servers with spinning disks, we don't use > tapes. Bacula version is 7.0.5+dfsg-4~bpo80+1 and > 7.4.3+dfsg-1+sid1~bpo8+1 (we tried both). > > We need 2 copies of each backup placed in separate datacenters, so we > run periodic Copy jobs to mirror data between storages. We want to use > odd-numbered storages to make a backup, and then copy it to > even-numbered storage. > > Our current configuration suffers from occasional deadlocks, when Bacula > tries to read and write from single storage. I thought it is probably > caused by mistakes in config, where storages have he same Media Type (as > documented at > http://www.bacula.org/7.4.x-manuals/en/main/Migration_Copy.html#SECTION002830000000000000000 > ). > > For this reason we decided to create new config where every storage have > different type from every other. When I tested this new config in > testing environment, jobs got stuck and never finished. > status storage=bacst2-stor showed: > > Device is BLOCKED waiting to create a volume for: > Pool: zdenek-test-pp_old-full-pool-mirror > Media type: File-storspec-mirror > Available Space=5.323 GB > > and never making progress - the device is unusable for all jobs (they > simply wait). I tried mount and label a new volume, it didnẗ made any > difference. The only thig that helps is to restart the storage daemon, > which makes the stuck job fail. > > Strace of storage daemon on bacst2 revealed that director connects to > it, both authenticate to each other and storage sends "\0\0\0\0223000 OK > Hello 305\n" to director. Storage then reads from socket and never gets > any reply - thread just blocks in read() syscall indefinitely. > > Strace of director confirms this - thread connects to storage, > authenticates, reads Hello and then never reply. Instead it opens > communication with bacst1 and starts sending commands. Even after > several minutes (test backups are several KB in size and usually > finishes in few seconds) the network socket to bacst2 is still open and > no communication is taking place. > > I verified this with tcpdump and there's nothing suspicious - the > connection works normally, last packet sent is the Hello message > described above. Communication on that four-tuple then simply stops, > nobody sends anything, never closing the connection. > There is no firewall or NAT between the servers - they are connected to > single internal network. > > I also tried to upgrade our 7.0 install to latest 7.4 from Debian, > results are exactly the same. > > Configuration and strace output are at: > https://drive.google.com/file/d/0B4bjslETcBa-ZHVkOHU4dlZCZ2s/view?usp=sharing > > I can reliably replicate the issue by running (on director): > > for i in `seq 1 2` ; do > for job in bacst1_storage-job --bacst1_storage-incremental-job-mirror \ > --bacst1_storage-full-job-mirror bacdir1_director-job \ > --bacdir1_director-incremental-job-mirror \ > --bacdir1_director-full-job-mirror ; do > echo "run job=$job yes" | bacula-console ; done ; done > > Is this a known problem? Is there any workaround? > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users