This issue is weird, and I wonder if anyone else has seen anything like this.

We are using bacula with file storage, and the volume labelling comes from a variable expansion:

  LabelFormat = "${JobId}-${Client}-${Level}-${Pool}-${Year}-${Month:p/2/0/r}-${Day:p/2/0/r}"

After upgrading the director node from bacula 9.4.2 (Ubuntu 20.04) to bacula 13.0.4 (Ubuntu 24.04), we see strange behaviour where job B is creating a volume name using the JobID and client name belonging to job A. This in turn confuses job A, when it finds the volume it's trying to create already exists. It keeps retrying until the day rolls over, and it gets a unique name again.

Let me illustrate with one particular example. Job 92925 was backing up director.nsrc.org, and job 92926 was backing up drupal.nsrc.org.

bacula=# select jobid,job,name,clientid from job where jobid in (92925,92926);
 jobid |                   job                    | name        | clientid
-------+------------------------------------------+-------------------+----------
 92926 | drupal.nsrc.org.2025-01-25_21.00.00_35   | drupal.nsrc.org   |       40  92925 | director.nsrc.org.2025-01-25_21.00.00_34 | director.nsrc.org |       37
(2 rows)

bacula=# select * from client where clientid in (37,40);
 clientid |         name         | uname                       | autoprune | fileretention | jobretention
----------+----------------------+--------------------------------------------------+-----------+---------------+--------------
       37 | director.nsrc.org-fd | 9.4.2 (04Feb19) x86_64-pc-linux-gnu,ubuntu,20.04 |         1 |      31536000 |     31536000        40 | drupal.nsrc.org-fd   | 9.6.7 (10Dec20) x86_64-pc-linux-gnu,ubuntu,22.04 |         1 |      31536000 |     31536000
(2 rows)


OK so far. Now look at which media those jobs were using:

bacula=# select * from jobmedia where jobid=92925;
 jobmediaid | jobid | mediaid | firstindex | lastindex | startfile | endfile | startblock |  endblock  | volindex
------------+-------+---------+------------+-----------+-----------+---------+------------+------------+----------
     404997 | 92925 |   91486 |          1 |      5604 | 0 |       0 |        263 |  999936173 |        2      404998 | 92925 |   91486 |       5604 |     60865 | 0 |       0 |  999936174 | 1609426358 |        3
(2 rows)

bacula=# select * from jobmedia where jobid=92926;
 jobmediaid | jobid | mediaid | firstindex | lastindex | startfile | endfile | startblock |  endblock  | volindex
------------+-------+---------+------------+-----------+-----------+---------+------------+------------+----------
     404586 | 92926 |   91455 |          1 |     54633 | 0 |       0 |        263 |  999936044 |        2      404587 | 92926 |   91455 |      54633 |     66615 | 0 |       0 |  999936045 | 1999871925 |        3      404588 | 92926 |   91455 |      66615 |     71853 | 0 |       0 | 1999871926 | 2999807889 |        4      404589 | 92926 |   91455 |      71853 |    109430 | 0 |       0 | 2999807890 | 3999743805 |        5      404590 | 92926 |   91455 |     109430 |    114941 | 0 |       1 | 3999743806 |  704712444 |        6      404591 | 92926 |   91455 |     114941 |    115392 | 1 |       1 |  704712445 | 1704648325 |        7      404592 | 92926 |   91455 |     115392 |    115865 | 1 |       1 | 1704648326 | 2389364108 |        8
(7 rows)

They are different media, as expected. But the *names* of those volumes are both derived from "director.nsrc.org" and named as if they are from job 92925:

bacula=# select mediaid,volumename,voljobs from media where mediaid in (91455,91486);
 mediaid |                   volumename                    | voljobs
---------+-------------------------------------------------+---------
   91455 | 92925-director.nsrc.org-fd-Full-Full-2025-01-26 |       1
   91486 | 92925-director.nsrc.org-fd-Full-Full-2025-01-27 |       1
(2 rows)

This matches what's on the filesystem: there is no volume with name starting 92926.

# ls -1 /backup/bacula/9292[56]*
/backup/bacula/92925-director.nsrc.org-fd-Full-Full-2025-01-26
/backup/bacula/92925-director.nsrc.org-fd-Full-Full-2025-01-27
#


Here are the logs:

--------

# grep '9292[56]' /var/log/bacula/bacula.log
26-Jan 02:05 store.nsrc.org-dir JobId 92926: Start Backup JobId 92926, Job=drupal.nsrc.org.2025-01-25_21.00.00_35 26-Jan 02:05 store.nsrc.org-dir JobId 92926: Error: bsockcore.c:400 Source address bind error. proto=2. ERR=Invalid argument 26-Jan 02:05 store.nsrc.org-dir JobId 92926: Connected to Storage "FileStorage0" at store.nsrc.org:9103 with TLS 26-Jan 02:05 store.nsrc.org-dir JobId 92926: Using Device "FileDev0" to write. 26-Jan 02:05 store.nsrc.org-dir JobId 92926: Storage "FileStorage0" was selected out of group of 2 available storages. StorageGroupPolicy "ListedOrder" used 26-Jan 02:05 store.nsrc.org-dir JobId 92926: Error: bsockcore.c:400 Source address bind error. proto=2. ERR=Invalid argument 26-Jan 02:05 store.nsrc.org-dir JobId 92926: Connected to Client "drupal.nsrc.org-fd" at drupal.nsrc.org:9102 without encryption *26-Jan 02:05 store.nsrc.org-sd JobId 92926: Labeled new Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" on File device "FileDev0" (/backup/bacula).* 26-Jan 02:05 store.nsrc.org-sd JobId 92926: Wrote label to prelabeled Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" on File device "FileDev0" (/backup/bacula) 26-Jan 02:05 store.nsrc.org-dir JobId 92926: Volume used once. Marking Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" as Used. 26-Jan 02:20 store.nsrc.org-sd JobId 92926: Elapsed time=00:15:12, Transfer rate=7.316 M Bytes/second 26-Jan 02:20 store.nsrc.org-sd JobId 92926: Sending spooled attrs to the Director. Despooling 29,199,872 bytes ... 26-Jan 02:20 store.nsrc.org-dir JobId 92926: Bacula store.nsrc.org-dir 13.0.4 (12Feb24):
  JobId:                  92926
  Volume name(s): 92925-director.nsrc.org-fd-Full-Full-2025-01-26
26-Jan 02:20 store.nsrc.org-dir JobId 92926: Begin pruning Jobs older than 1 year .
26-Jan 02:20 store.nsrc.org-dir JobId 92926: No Jobs found to prune.
26-Jan 02:20 store.nsrc.org-dir JobId 92926: Begin pruning Files.
26-Jan 02:20 store.nsrc.org-dir JobId 92926: No Files found to prune.
26-Jan 02:20 store.nsrc.org-dir JobId 92926: End auto prune.
26-Jan 02:04 store.nsrc.org-dir JobId 92925: Start Backup JobId 92925, Job=director.nsrc.org.2025-01-25_21.00.00_34 26-Jan 02:04 store.nsrc.org-dir JobId 92925: Error: bsockcore.c:400 Source address bind error. proto=2. ERR=Invalid argument 26-Jan 02:04 store.nsrc.org-dir JobId 92925: Connected to Storage "FileStorage0" at store.nsrc.org:9103 with TLS 26-Jan 02:04 store.nsrc.org-dir JobId 92925: Created new Volume="92925-director.nsrc.org-fd-Full-Full-2025-01-26", Pool="Full", MediaType="File0" in catalog. *26-Jan 02:04 store.nsrc.org-dir JobId 92925: Storage daemon "FileStorage0" didn't accept Device "FileDev0" because: 3924 Device "FileDev0" not in SD Device resources or no matching Media Type or is disabled.* 26-Jan 02:04 store.nsrc.org-dir JobId 92925: Error: bsockcore.c:400 Source address bind error. proto=2. ERR=Invalid argument 26-Jan 02:04 store.nsrc.org-dir JobId 92925: Connected to Storage "FileStorage1" at store.nsrc.org:9103 with TLS *26-Jan 02:04 store.nsrc.org-dir JobId 92925: Error: sql_create.c:430 Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" already exists.* 26-Jan 02:04 store.nsrc.org-dir JobId 92925: Using Device "FileDev1" to write. 26-Jan 02:04 store.nsrc.org-dir JobId 92925: Storage "FileStorage1" was selected out of group of 2 available storages. StorageGroupPolicy "ListedOrder" used 26-Jan 02:04 store.nsrc.org-dir JobId 92925: Error: bsockcore.c:400 Source address bind error. proto=2. ERR=Invalid argument 26-Jan 02:04 store.nsrc.org-dir JobId 92925: Connected to Client "director.nsrc.org-fd" at director.nsrc.org:9102 without encryption 26-Jan 02:04 store.nsrc.org-dir JobId 92925: Error: sql_create.c:430 Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" already exists. 26-Jan 02:04 store.nsrc.org-dir JobId 92925: Error: sql_create.c:430 Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" already exists. 26-Jan 02:04 store.nsrc.org-sd JobId 92925: Job director.nsrc.org.2025-01-25_21.00.00_34 is waiting. Cannot find any appendable volumes. 26-Jan 02:09 store.nsrc.org-dir JobId 92925: Error: sql_create.c:430 Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" already exists. 26-Jan 02:14 store.nsrc.org-dir JobId 92925: Error: sql_create.c:430 Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" already exists.
<<snip lots>>
26-Jan 23:54 store.nsrc.org-dir JobId 92925: Error: sql_create.c:430 Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" already exists. 26-Jan 23:59 store.nsrc.org-dir JobId 92925: Error: sql_create.c:430 Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" already exists. 27-Jan 00:04 store.nsrc.org-dir JobId 92925: Created new Volume="92925-director.nsrc.org-fd-Full-Full-2025-01-27", Pool="Full", MediaType="File1" in catalog. 27-Jan 00:04 store.nsrc.org-sd JobId 92925: Labeled new Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-27" on File device "FileDev1" (/backup/bacula). 27-Jan 00:04 store.nsrc.org-sd JobId 92925: Wrote label to prelabeled Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-27" on File device "FileDev1" (/backup/bacula) 27-Jan 00:04 store.nsrc.org-dir JobId 92925: Volume used once. Marking Volume "92925-director.nsrc.org-fd-Full-Full-2025-01-27" as Used. 27-Jan 00:10 store.nsrc.org-sd JobId 92925: Elapsed time=00:06:00, Transfer rate=4.460 M Bytes/second 27-Jan 00:10 store.nsrc.org-sd JobId 92925: Sending spooled attrs to the Director. Despooling 14,038,445 bytes ... 27-Jan 00:10 store.nsrc.org-dir JobId 92925: Bacula store.nsrc.org-dir 13.0.4 (12Feb24):
  JobId:                  92925
  Volume name(s): 92925-director.nsrc.org-fd-Full-Full-2025-01-27
27-Jan 00:10 store.nsrc.org-dir JobId 92925: Begin pruning Jobs older than 1 year .
27-Jan 00:10 store.nsrc.org-dir JobId 92925: No Jobs found to prune.
27-Jan 00:10 store.nsrc.org-dir JobId 92925: Begin pruning Files.
27-Jan 00:10 store.nsrc.org-dir JobId 92925: No Files found to prune.
27-Jan 00:10 store.nsrc.org-dir JobId 92925: End auto prune.

--------

What appears to be happening is:

- Job 92926 creates the volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" - even though its job ID is 92925, and the client is drupal.nsrc.org! - and successfully uses it.

- Job 92925 starts, and tries to create volume "92925-director.nsrc.org-fd-Full-Full-2025-01-26" - which is the correct name. But it finds that the volume already exists. It backs off and keeps retrying, hundreds of times

- Eventually it goes past midnight, and it tries to create volume "92925-director.nsrc.org-fd-Full-Full-2025-01-27", which succeeds, and the backup completes.

The "Source address bind error" comes from DirSourceAddress and I've separately raised this as an issue at
https://gitlab.bacula.org/bacula-community-edition/bacula-community/-/issues/2742

I don't *think* it's related, but I'll run without DirSourceAddress tonight just to be sure.

I don't know what the message

*Storage daemon "FileStorage0" didn't accept Device "FileDev0" because: 3924 Device "FileDev0" not in SD Device resources or no matching Media Type or is disabled.*

is telling me, and whether this is part of the problem - this message didn't appear prior to the upgrade.

I will provide the config files in a following mail, as my first attempt breached the Sourceforge message size limit.

Thanks,

Brian Candler.
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to